Dual-chat Streamlit experience for comparing Groq vs. OpenAI latency when both models share the same RAG pipeline over Don Quixote.
- Python 3.11+
pip install -r requirements.txt- Environment variables (or Streamlit secrets):
DATABASE_URL(Neon connection string)OPENAI_API_KEY_CHAT(used for chat completions)OPENAI_API_KEY_EMBED(used for embeddings inretriever.py)GROQ_API_KEY_CHAT
cd src/db
alembic upgrade headThis applies 202502091200_clean_org_chart_schema, which ensures the schema matches the ingestion/retrieval helpers.
python src/rag/ingest.py --preset don-quixoteThe preset pulls docs/don_quixote.txt, uses a plain-text chunker optimized for .txt prose, stores the text/embeddings in Neon, and skips any duplicate content automatically.
streamlit run src/app/streamlit_app.pyOn Streamlit Community Cloud, set the same secrets plus optional overrides:
| Secret | Purpose |
|---|---|
OPEN_AI_CHAT_MODEL |
Overrides default OpenAI chat model (defaults to gpt-4o-mini) |
GROQ_CHAT_MODEL |
Overrides Groq model (defaults to llama-3.1-70b) |
The app renders two chat panes (Groq left, OpenAI right), reuses a single prompt input, and logs latency deltas per run. Use the expander at the bottom of the page to inspect the retrieved context for each comparison.