A turnkey RAG (Retrieval Augmented Generation) deployment powered by llm4s.
Features:
- REST API for document ingestion and querying
- PostgreSQL + pgvector for scalable vector storage
- Persistent document registry - survives restarts, tracks content changes
- Multiple embedding providers (OpenAI, VoyageAI, Ollama)
- Multiple LLM providers for answer generation (OpenAI, Anthropic)
- Hybrid search (vector + keyword) with RRF fusion
- Incremental ingestion with content hash change detection
- Built-in connectors (directory, URL) with scheduled sync
- Visibility API - inspect chunks, view config, understand RAG behavior
- Chunking Preview - test chunking strategies before committing
- Runtime Configuration - modify settings without server restart
- Per-Collection Chunking - different chunking strategies per collection with file-type overrides
- Python SDK for custom ingesters
- Docker Compose for easy deployment
- Docker and Docker Compose
- An API key for OpenAI (or Anthropic)
# Clone the repository
git clone https://github.com/llm4s/rag-in-a-box.git
cd rag-in-a-box
# Copy and configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
# Start all services
docker-compose up -d
# Check health
curl http://localhost:8080/health# Start PostgreSQL with pgvector
docker run -d --name ragbox-postgres \
-e POSTGRES_USER=rag \
-e POSTGRES_PASSWORD=rag \
-e POSTGRES_DB=ragdb \
-v $(pwd)/scripts/init-db.sql:/docker-entrypoint-initdb.d/01-init.sql \
-p 15432:5432 \
pgvector/pgvector:pg15
# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
# Run the application
./run.sh- API: http://localhost:8080
- Admin UI: http://localhost:8080/admin
- PostgreSQL: localhost:15432 (user: rag, password: rag)
RAG in a Box includes a built-in web-based admin interface for managing documents, configuration, and monitoring.
- Dashboard - Overview of document counts, chunk statistics, and system health
- Documents - Browse, search, upload, and delete documents
- Upload - Add documents via text, file upload, or URL ingestion
- Configuration - View and modify runtime settings
- Collections - Manage per-collection chunking configurations
- Chunking Preview - Test and compare chunking strategies before applying
- Visibility - Inspect chunks and understand RAG behavior
- Ingestion - Monitor and trigger ingestion jobs
The Admin UI is bundled with the API server and available at /admin:
# Start the server
docker-compose up -d
# Open in browser
open http://localhost:8080/adminFor frontend development with hot reload:
cd admin-ui
npm install
npm run devThis starts a development server at http://localhost:3000 that proxies API requests to the backend.
curl -X POST http://localhost:8080/api/v1/documents \
-H "Content-Type: application/json" \
-d '{
"content": "PostgreSQL is a powerful open-source relational database...",
"filename": "postgres-intro.txt",
"metadata": {"source": "documentation"}
}'curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"question": "What is PostgreSQL?",
"topK": 5
}'curl -X POST http://localhost:8080/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "database features",
"topK": 10
}'curl http://localhost:8080/api/v1/statsRAG in a Box supports idempotent document ingestion for custom ingesters:
# First upsert - creates document
curl -X PUT http://localhost:8080/api/v1/documents/my-doc-1 \
-H "Content-Type: application/json" \
-d '{"content": "Document content here...", "metadata": {"source": "api"}}'
# Returns: {"action": "created", "chunks": 3, ...}
# Same content - skips re-indexing
curl -X PUT http://localhost:8080/api/v1/documents/my-doc-1 \
-H "Content-Type: application/json" \
-d '{"content": "Document content here...", "metadata": {"source": "api"}}'
# Returns: {"action": "unchanged", ...}
# Updated content - re-indexes
curl -X PUT http://localhost:8080/api/v1/documents/my-doc-1 \
-H "Content-Type: application/json" \
-d '{"content": "Updated document content!", "metadata": {"source": "api"}}'
# Returns: {"action": "updated", "chunks": 2, ...}After upserting documents, prune orphaned ones:
# Get sync status
curl http://localhost:8080/api/v1/sync/status
# List synced document IDs
curl http://localhost:8080/api/v1/sync/documents
# Complete sync and prune documents not in keep list
curl -X POST http://localhost:8080/api/v1/sync \
-H "Content-Type: application/json" \
-d '{"keepDocumentIds": ["my-doc-1", "my-doc-2"]}'# Basic health
curl http://localhost:8080/health
# Readiness (includes database check)
curl http://localhost:8080/health/readyAll settings can be configured via environment variables:
| Variable | Default | Description |
|---|---|---|
SERVER_HOST |
0.0.0.0 |
Server bind address |
SERVER_PORT |
8080 |
Server port |
| Variable | Default | Description |
|---|---|---|
PG_HOST |
localhost |
PostgreSQL host |
PG_PORT |
5432 |
PostgreSQL port |
PG_DATABASE |
ragdb |
Database name |
PG_USER |
rag |
Database user |
PG_PASSWORD |
rag |
Database password |
DATABASE_URL |
- | Alternative: full connection URL |
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
For OpenAI | OpenAI API key |
ANTHROPIC_API_KEY |
For Anthropic | Anthropic API key |
VOYAGE_API_KEY |
For VoyageAI | VoyageAI API key |
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
openai |
Provider: openai, voyage, ollama |
EMBEDDING_MODEL |
text-embedding-3-small |
Embedding model |
| Variable | Default | Description |
|---|---|---|
LLM_MODEL |
openai/gpt-4o |
Model for answer generation |
LLM_TEMPERATURE |
0.1 |
Temperature for responses |
| Variable | Default | Description |
|---|---|---|
RAG_CHUNKING_STRATEGY |
sentence |
Strategy: simple, sentence, markdown, semantic |
RAG_CHUNK_SIZE |
800 |
Target chunk size (characters) |
RAG_CHUNK_OVERLAP |
150 |
Overlap between chunks |
RAG_TOP_K |
5 |
Number of context chunks |
RAG_FUSION_STRATEGY |
rrf |
Fusion: rrf, weighted, vector_only, keyword_only |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/documents |
Upload a document |
| PUT | /api/v1/documents/{id} |
Upsert a document (idempotent) |
| GET | /api/v1/documents |
List documents |
| DELETE | /api/v1/documents |
Clear all documents |
| DELETE | /api/v1/documents/{id} |
Delete a document |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/sync/status |
Get sync status |
| GET | /api/v1/sync/documents |
List synced document IDs |
| POST | /api/v1/sync |
Mark sync complete (optionally prune) |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/query |
Query with answer generation |
| POST | /api/v1/search |
Search without LLM |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/ingest/directory |
Ingest from directory |
| POST | /api/v1/ingest/url |
Ingest from URLs |
| POST | /api/v1/ingest/run |
Run all configured sources |
| POST | /api/v1/ingest/run/{source} |
Run specific source |
| GET | /api/v1/ingest/status |
Get ingestion status |
| GET | /api/v1/ingest/sources |
List configured sources |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/config |
Get current configuration |
| GET | /api/v1/config/providers |
List available providers |
| GET | /api/v1/stats |
Get RAG statistics |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/visibility/config |
Detailed config with changeability annotations |
| GET | /api/v1/visibility/chunks |
List all chunks (paginated) |
| GET | /api/v1/visibility/chunks/{docId} |
Get all chunks for a document |
| GET | /api/v1/visibility/chunks/{docId}/{idx} |
Get specific chunk |
| GET | /api/v1/visibility/stats |
Detailed stats with chunk size distribution |
| GET | /api/v1/visibility/collections |
Collection details with chunking info |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/chunking/preview |
Preview chunking on sample text |
| POST | /api/v1/chunking/compare |
Compare multiple strategies |
| GET | /api/v1/chunking/strategies |
List available strategies |
| GET | /api/v1/chunking/presets |
Get preset configurations |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/config/runtime |
Get current runtime settings |
| PUT | /api/v1/config/runtime |
Update runtime settings |
| POST | /api/v1/config/runtime/validate |
Validate proposed changes |
| GET | /api/v1/config/runtime/history |
Get config change history |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/collections/{name}/config |
Get collection chunking config |
| PUT | /api/v1/collections/{name}/config |
Set collection chunking config |
| DELETE | /api/v1/collections/{name}/config |
Remove custom config (use defaults) |
| GET | /api/v1/collections/configs |
List all collection configs |
| POST | /api/v1/collections/{name}/config/preview |
Preview effective config for a file |
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Basic health check |
| GET | /health/ready |
Readiness check |
| GET | /health/live |
Liveness check |
RAG in a Box includes built-in connectors for common data sources:
Ingest all documents from a directory:
curl -X POST http://localhost:8080/api/v1/ingest/directory \
-H "Content-Type: application/json" \
-d '{
"path": "/data/docs",
"patterns": ["*.md", "*.txt", "*.pdf"],
"recursive": true
}'Ingest documents from URLs:
curl -X POST http://localhost:8080/api/v1/ingest/url \
-H "Content-Type: application/json" \
-d '{"urls": ["https://example.com/doc1.html", "https://example.com/doc2.html"]}'Set environment variables for automatic directory ingestion:
export INGEST_DIR=/data/docs
export INGEST_PATTERNS="*.md,*.txt,*.pdf"
export INGEST_RECURSIVE=true
export INGEST_ON_STARTUP=true
export INGEST_SCHEDULE="6h" # Run every 6 hours
./run.shOr configure in application.conf:
ingestion {
enabled = true
run-on-startup = true
schedule = "6h" # Options: "5m", "1h", "6h", "daily", "hourly"
sources = [
{
type = "directory"
name = "docs"
path = "/data/docs"
patterns = ["*.md", "*.txt", "*.pdf"]
recursive = true
}
]
}| Format | Description |
|---|---|
5m, 30m |
Every N minutes |
1h, 6h, 12h |
Every N hours |
1d, 7d |
Every N days |
hourly |
Every hour |
daily |
Once per day |
weekly |
Once per week |
0 * * * * |
Cron: every hour |
0 */6 * * * |
Cron: every 6 hours |
# Check ingestion status
curl http://localhost:8080/api/v1/ingest/status
# List configured sources
curl http://localhost:8080/api/v1/ingest/sources
# Run all configured sources
curl -X POST http://localhost:8080/api/v1/ingest/runA Python client is available for easy integration:
cd sdk/python
pip install -e .from ragbox import RagBoxClient, Document
client = RagBoxClient("http://localhost:8080")
# Upsert documents (idempotent)
docs = [
Document(id="doc-1", content="First document"),
Document(id="doc-2", content="Second document"),
]
for doc in docs:
result = client.upsert(doc)
print(f"{doc.id}: {result.action}")
# Query with answer
result = client.query("What is in the documents?")
print(result.answer)
# Prune deleted documents
client.sync(keep_ids=[doc.id for doc in docs])See sdk/python/README.md for full documentation.
The Runtime Configuration API allows you to modify settings without restarting the server. Settings are classified as:
- Hot: Changes take effect immediately (topK, fusionStrategy, systemPrompt)
- Warm: Changes affect new documents only (chunkingStrategy, chunkSize)
curl http://localhost:8080/api/v1/config/runtimecurl -X PUT http://localhost:8080/api/v1/config/runtime \
-H "Content-Type: application/json" \
-d '{"topK": 10, "fusionStrategy": "weighted"}'curl -X POST http://localhost:8080/api/v1/config/runtime/validate \
-H "Content-Type: application/json" \
-d '{"topK": 10, "chunkSize": 500}'curl http://localhost:8080/api/v1/config/runtime/historyThe Per-Collection Chunking API allows different collections to use different chunking configurations. This is useful when you have documents with different characteristics in different collections.
When determining the effective configuration for a file:
- File-type override - If the collection config has a file-type strategy for the file extension
- Collection config - The collection's custom settings
- Runtime defaults - The global default settings
curl http://localhost:8080/api/v1/collections/my-collection/configResponse shows both the custom config (if any) and the effective config:
{
"collection": "my-collection",
"hasCustomConfig": true,
"config": {
"strategy": "markdown",
"targetSize": 1000,
"fileTypeStrategies": {".md": "markdown", ".txt": "sentence"}
},
"effectiveConfig": {
"strategy": "markdown",
"targetSize": 1000,
"maxSize": 1600,
"overlap": 150,
"source": "collection"
},
"documentCount": 25
}curl -X PUT http://localhost:8080/api/v1/collections/my-collection/config \
-H "Content-Type: application/json" \
-d '{
"strategy": "markdown",
"targetSize": 1000,
"fileTypeStrategies": {
".md": "markdown",
".txt": "sentence"
}
}'Test which settings will apply for a specific file:
curl -X POST http://localhost:8080/api/v1/collections/my-collection/config/preview \
-H "Content-Type: application/json" \
-d '{"collection": "my-collection", "filename": "README.md"}'Response includes the resolution path showing how the config was determined:
{
"collection": "my-collection",
"filename": "README.md",
"effectiveConfig": {
"strategy": "markdown",
"targetSize": 1000,
"source": "file-type",
"appliedFileTypeOverride": ".md"
},
"configResolutionPath": [
"Checked file extension: .md",
"Found file-type override in collection 'my-collection' config",
"Using strategy: markdown"
]
}curl http://localhost:8080/api/v1/collections/configscurl -X DELETE http://localhost:8080/api/v1/collections/my-collection/configThe Chunking Preview API lets you test chunking strategies before committing to them. This is essential for tuning RAG performance.
Test how content will be chunked with current or custom settings:
curl -X POST http://localhost:8080/api/v1/chunking/preview \
-H "Content-Type: application/json" \
-d '{
"content": "# My Document\n\nThis is sample content to test chunking...",
"strategy": "markdown",
"targetSize": 500
}'Compare multiple chunking strategies on the same content:
curl -X POST http://localhost:8080/api/v1/chunking/compare \
-H "Content-Type: application/json" \
-d '{
"content": "Your sample content here...",
"strategies": ["simple", "sentence", "markdown"]
}'The response includes a recommendation based on content analysis.
View available strategies with their descriptions and trade-offs:
curl http://localhost:8080/api/v1/chunking/strategiesGet preset configurations for different use cases:
curl http://localhost:8080/api/v1/chunking/presetsThe Visibility API provides insight into how your RAG system is configured and how documents are being chunked. This is essential for understanding and tuning RAG performance.
View detailed configuration with changeability annotations:
curl http://localhost:8080/api/v1/visibility/configResponse includes changeability information for each setting:
- Hot: Can change at runtime with immediate effect (topK, fusionStrategy)
- Warm: Can change at runtime, affects new documents only (chunkSize, chunkingStrategy)
- Cold: Requires restart and full re-indexing (embeddingProvider, embeddingModel)
View how documents are chunked:
# List all chunks (paginated)
curl "http://localhost:8080/api/v1/visibility/chunks?page=1&pageSize=20"
# Get all chunks for a specific document
curl http://localhost:8080/api/v1/visibility/chunks/{documentId}
# Get a specific chunk
curl http://localhost:8080/api/v1/visibility/chunks/{documentId}/0Get detailed statistics including chunk size distribution:
curl http://localhost:8080/api/v1/visibility/statsResponse includes:
- Document and chunk counts
- Per-collection statistics
- Chunk size distribution (min, max, avg, median, p90)
- Histogram buckets for chunk sizes
- Ingestion timestamps
View collections with their chunking configuration:
curl http://localhost:8080/api/v1/visibility/collectionsRAG in a Box uses PostgreSQL-backed document registry for durability:
- Document tracking persists across restarts - No re-indexing needed after restart
- Content hash detection - Only changed documents are re-indexed
- Sync status preserved - Last sync time and document list survive restarts
The document registry stores:
- Document IDs and content hashes (SHA-256)
- Chunk counts and metadata
- Indexed and updated timestamps
This enables efficient incremental ingestion workflows where only new or modified documents are processed.
sbt assembly
java -jar target/scala-3.7.1/ragbox-assembly.jarsbt test┌─────────────────────────────────────────────────────────────────────┐
│ Docker Compose │
├─────────────────────────────────────────────────────────────────────┤
│ ┌────────────────────────┐ ┌────────────────────────────────────┐ │
│ │ RAG API Service │ │ PostgreSQL + pgvector │ │
│ │ (Scala + http4s) │ │ │ │
│ │ │ │ ┌─────────────────────────────┐ │ │
│ │ • Document upload │◀─│ │ rag_embeddings (vectors) │ │ │
│ │ • Upsert (idempotent) │ │ │ document_registry (tracking)│ │ │
│ │ • Query + Answer │ │ │ chunk_registry (visibility) │ │ │
│ │ • Visibility API │ │ │ collection_configs (tuning) │ │ │
│ │ • Runtime Config │ │ │ config_history (audit) │ │ │
│ │ • Collection Config │ │ │ sync_status (sync state) │ │ │
│ │ Port 8080 │ │ └─────────────────────────────┘ │ │
│ └────────────────────────┘ └────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│ │
▼ External Services ▼ Custom Ingesters
┌──────┴──────┐ ┌──────┴──────┐
│ LLM API │ │ Python SDK │
│ (OpenAI, │ │ REST API │
│ Anthropic) │ │ Scheduled │
└─────────────┘ └─────────────┘
MIT License - see LICENSE for details.