Production-ready LLM-driven SQL optimization and analysis tool for detecting performance issues, suggesting optimizations, and measuring real-world improvements.
High Test Coverage - 139 passing tests, 90% coverage
Real-time Performance Validation - Measures actual speedups in test databases
Comprehensive Monitoring - Prometheus metrics, Grafana dashboards, and alerts
LLM Cost Controls - Token tracking, budget limits, and usage reporting
SQL Auditor provides an intuitive web interface for analyzing SQL queries, viewing detected issues, and receiving optimization recommendations.
- Static Analysis: Detects 10+ common SQL anti-patterns (SELECT *, unused joins, cartesian products, non-SARGable predicates, etc.)
- LLM Integration: Uses OpenAI GPT-4 to generate natural language explanations and optimized query rewrites
- Index Advisor: Recommends indexes based on query patterns (WHERE, JOIN, ORDER BY, GROUP BY)
- Cost Estimation: Heuristic-based cost scoring and improvement estimates
- Web UI: Clean React + TypeScript interface for interactive analysis
- Jupyter Notebooks: Prototyping and experimentation environment
- Docker Support: One-command deployment with docker-compose
┌─────────────┐
│ Frontend │ React + TypeScript + Tailwind
│ (Port 5173)│
└──────┬──────┘
│ HTTP
▼
┌─────────────┐
│ FastAPI │ REST API
│ (Port 8000)│
└──────┬──────┘
│
├──► Parser (SQLGlot)
├──► Rules Engine (10 rules)
├──► Cost Estimator
├──► Index Advisor
└──► LLM Provider (OpenAI)
- Python 3.11+
- Node.js 18+ (for frontend)
- Poetry (recommended) or pip
- SQLite3 (for demo database)
- OpenAI API key (optional, for LLM features)
-
Clone the repository
git clone <repo-url> cd llm-sql-auditor
-
Set up environment
cp .env.example .env # Edit .env and add your OPENAI_API_KEY (optional - static analysis works without it) -
Install Python dependencies
poetry install # OR pip install -r requirements.txt -
Seed demo database
./scripts/seed_demo.sh
-
Start development servers
./scripts/dev_up.sh
Or manually:
# Terminal 1: Backend poetry run uvicorn backend.app:app --reload --port 8000 # Terminal 2: Frontend cd frontend && npm install && npm run dev
-
Access the application
- Frontend: Available on the configured port (default: 5173)
- API: Available on the configured port (default: 8000)
- API Docs: Available at
/docsendpoint
docker-compose up --build- Open the frontend application in your browser
- Paste your schema DDL in the "Schema" textarea
- Enter SQL queries (separate multiple queries with
---) - Select SQL dialect (PostgreSQL or SQLite)
- Click "Analyze Queries"
- Review issues, rewrites, index suggestions, and LLM explanations
curl -X POST <API_URL>/api/audit \
-H "Content-Type: application/json" \
-d '{
"schema": "CREATE TABLE users (id INTEGER, email TEXT);",
"queries": [
"SELECT * FROM users;",
"SELECT * FROM users WHERE LOWER(email) = '\''[email protected]'\'';"
],
"dialect": "postgres"
}'curl -X POST <API_URL>/api/explain \
-H "Content-Type: application/json" \
-d '{
"schema": "CREATE TABLE users (id INTEGER, email TEXT);",
"query": "SELECT * FROM users WHERE LOWER(email) = '\''[email protected]'\'';",
"dialect": "postgres"
}'cd notebooks
jupyter notebook 01_prototype.ipynbThe rules engine detects the following issues:
| Code | Rule | Severity | Description | Example |
|---|---|---|---|---|
| R001 | SELECT_STAR | warn | Avoid SELECT * in production | SELECT * FROM users; |
| R002 | UNUSED_JOIN | warn | Joins where columns aren't referenced | SELECT u.id FROM users u JOIN orders o ON ... |
| R003 | CARTESIAN_JOIN | error | Joins without ON predicate | SELECT * FROM users, orders; |
| R004 | NON_SARGABLE | warn | Functions on indexed columns in WHERE | WHERE LOWER(email) = 'test' |
| R005 | MISSING_PREDICATE | warn | Large table scans without WHERE | SELECT * FROM orders; (100k+ rows) |
| R006 | ORDER_BY_NO_INDEX | info | ORDER BY columns lacking index | ORDER BY created_at (no index) |
| R007 | DISTINCT_MISUSE | info | DISTINCT as de-dupe band-aid | SELECT DISTINCT ... with joins |
| R008 | N_PLUS_ONE_PATTERN | warn | Correlated subqueries | WHERE EXISTS (SELECT ... WHERE t.id = outer.id) |
| R009 | LIKE_PREFIX_WILDCARD | warn | LIKE with leading wildcard | WHERE name LIKE '%value' |
| R010 | AGG_NO_GROUPING_INDEX | info | Aggregations missing covering index | GROUP BY category (no index) |
Original Query:
SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE LOWER(u.email) = '[email protected]'
ORDER BY o.created_at DESC;Detected Issues:
- R001: SELECT * usage
- R004: Non-SARGable predicate (LOWER function)
- R006: ORDER BY without supporting index
Optimized Query:
SELECT o.id, o.user_id, o.created_at, o.total_cents,
u.id, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = '[email protected]' -- Removed LOWER()
ORDER BY o.created_at DESC;Recommended Index:
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_users_email ON users(email);{
"summary": {
"totalIssues": 5,
"highSeverity": 1,
"estImprovement": "2-4x potential speedup - Issues: Full scan on large table 'orders' (100000 rows), Non-SARGable function in WHERE clause"
},
"issues": [
{
"code": "R001",
"severity": "warn",
"message": "Avoid SELECT * in production queries...",
"rule": "SELECT_STAR"
}
],
"rewrites": [
{
"original": "SELECT * FROM orders...",
"optimized": "SELECT o.id, o.user_id FROM orders o...",
"rationale": "Project only needed columns..."
}
],
"indexes": [
{
"table": "orders",
"columns": ["user_id", "created_at"],
"type": "btree",
"rationale": "Supports ORDER BY and join"
}
],
"llmExplain": "Your query scans entire orders due to SELECT *..."
}# Run all tests
pytest
# Run with coverage
pytest --cov=backend --cov-report=html
# Run specific test file
pytest backend/tests/test_rules.pysql-auditor/
├── backend/
│ ├── app.py # FastAPI entry point
│ ├── core/
│ │ ├── config.py # Configuration
│ │ ├── models.py # Pydantic models
│ │ └── dialects.py # SQL dialect handling
│ ├── services/
│ │ ├── analyzer/
│ │ │ ├── parser.py # SQL parsing
│ │ │ ├── rules_engine.py # 10 detection rules
│ │ │ ├── cost_estimator.py
│ │ │ └── index_advisor.py
│ │ ├── llm/
│ │ │ ├── provider.py # OpenAI integration
│ │ │ └── prompts.py # LLM prompts
│ │ └── pipeline.py # Main orchestration
│ ├── db/
│ │ ├── seed.sql # Demo schema
│ │ └── explain_helpers.py
│ └── tests/ # Test suite
├── frontend/
│ ├── src/
│ │ ├── App.tsx
│ │ ├── api.ts
│ │ └── components/
│ └── package.json
├── notebooks/
│ └── 01_prototype.ipynb
├── scripts/
│ ├── dev_up.sh
│ └── seed_demo.sh
├── docker-compose.yml
├── Dockerfile
└── README.md
Environment variables (see .env.example):
OPENAI_API_KEY: OpenAI API key for LLM features (optional)SQLAUDITOR_LOG_LEVEL: Logging level (default: INFO)SQLAUDITOR_DEMO_DB: Path to demo SQLite databaseDEFAULT_DIALECT: Default SQL dialect (postgres/sqlite)
- Real EXPLAIN integration with live database connections
- Multi-dialect support (PostgreSQL, SQLite)
- Comprehensive monitoring and alerting
- LLM cost tracking and budgeting
- User Authentication (JWT/OAuth)
- Audit History Persistence (PostgreSQL)
- GitHub Actions CI/CD integration
- Slack/Teams integration for alerts
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass and coverage ≥ 90%
- Submit a pull request
MIT License - see LICENSE file for details.
- Built with FastAPI
- SQL parsing via SQLGlot
- LLM integration via OpenAI API
