git clone <repository-url>
cd rag-latency-demo
python setup.py
2. Run Benchmarks
bash
# Comprehensive benchmark
python working_benchmark.py
# Ultimate speed test
python ultimate_benchmark.py
3. Start API Server
bash
uvicorn app.main:app --reload
Open: http://localhost:8000/docs
4. Test with curl (PowerShell)
powershell
$body = @{
question = "What is artificial intelligence?"
} | ConvertTo-Json
Invoke-RestMethod -Uri "http://localhost:8000/query" `
-Method Post `
-ContentType "application/json" `
-Body $body
Or with curl (Command Prompt):
bash
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d "{\"question\": \"What is artificial intelligence?\"}"
Project Structure
text
rag-latency-demo/
├── app/ # Core application code
├── data/ # Sample documents (add your own)
├── models/ # ML models (downloaded automatically)
├── scripts/ # Utility scripts
├── benchmarks/ # Benchmark results (generated)
├── requirements.txt # Python dependencies
├── Dockerfile # Container deployment
└── README.md # Full documentation
Key Files
app/rag_naive.py - Baseline implementation
app/rag_optimized.py - Optimized RAG with caching
app/no_compromise_rag.py - Ultimate optimization
app/main.py - FastAPI server
config.py - Configuration settings
Performance Results
Baseline (Naive RAG): 247ms average
Optimized RAG: 179ms average (1.4x faster)
No-Compromise RAG: 92ms average (2.7x faster ⚡)
Adding Your Own Data
Place text files (.txt, .md) in data/ directory
Run: python scripts/initialize_rag.py
Test with: python -c "from app.rag_naive import NaiveRAG; r=NaiveRAG(); r.initialize(); print(r.query(\"Test query\"))"
Deployment
See DEPLOYMENT.md for production deployment instructions.