Skip to content

SaaS-style SQL auditor for B2B data teams: analyzes schema + queries using LLM + rules engine to audit, optimize, and explain SQL in natural language.

License

Notifications You must be signed in to change notification settings

zaydabash/llm-sql-auditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Auditor

Python FastAPI License Tests Coverage

Production-ready LLM-driven SQL optimization and analysis tool for detecting performance issues, suggesting optimizations, and measuring real-world improvements.

Current Status:

High Test Coverage - 139 passing tests, 90% coverage
Real-time Performance Validation - Measures actual speedups in test databases
Comprehensive Monitoring - Prometheus metrics, Grafana dashboards, and alerts
LLM Cost Controls - Token tracking, budget limits, and usage reporting

Screenshots

SQL Auditor

SQL Auditor provides an intuitive web interface for analyzing SQL queries, viewing detected issues, and receiving optimization recommendations.

Features

  • Static Analysis: Detects 10+ common SQL anti-patterns (SELECT *, unused joins, cartesian products, non-SARGable predicates, etc.)
  • LLM Integration: Uses OpenAI GPT-4 to generate natural language explanations and optimized query rewrites
  • Index Advisor: Recommends indexes based on query patterns (WHERE, JOIN, ORDER BY, GROUP BY)
  • Cost Estimation: Heuristic-based cost scoring and improvement estimates
  • Web UI: Clean React + TypeScript interface for interactive analysis
  • Jupyter Notebooks: Prototyping and experimentation environment
  • Docker Support: One-command deployment with docker-compose

Architecture

┌─────────────┐
│   Frontend  │  React + TypeScript + Tailwind
│  (Port 5173)│
└──────┬──────┘
       │ HTTP
       ▼
┌─────────────┐
│   FastAPI   │  REST API
│  (Port 8000)│
└──────┬──────┘
       │
       ├──► Parser (SQLGlot)
       ├──► Rules Engine (10 rules)
       ├──► Cost Estimator
       ├──► Index Advisor
       └──► LLM Provider (OpenAI)

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+ (for frontend)
  • Poetry (recommended) or pip
  • SQLite3 (for demo database)
  • OpenAI API key (optional, for LLM features)

Installation

  1. Clone the repository

    git clone <repo-url>
    cd llm-sql-auditor
  2. Set up environment

    cp .env.example .env
    # Edit .env and add your OPENAI_API_KEY (optional - static analysis works without it)
  3. Install Python dependencies

    poetry install
    # OR
    pip install -r requirements.txt
  4. Seed demo database

    ./scripts/seed_demo.sh
  5. Start development servers

    ./scripts/dev_up.sh

    Or manually:

    # Terminal 1: Backend
    poetry run uvicorn backend.app:app --reload --port 8000
    
    # Terminal 2: Frontend
    cd frontend && npm install && npm run dev
  6. Access the application

    • Frontend: Available on the configured port (default: 5173)
    • API: Available on the configured port (default: 8000)
    • API Docs: Available at /docs endpoint

Docker

docker-compose up --build

Usage

Web UI

  1. Open the frontend application in your browser
  2. Paste your schema DDL in the "Schema" textarea
  3. Enter SQL queries (separate multiple queries with ---)
  4. Select SQL dialect (PostgreSQL or SQLite)
  5. Click "Analyze Queries"
  6. Review issues, rewrites, index suggestions, and LLM explanations

API

Audit Multiple Queries

curl -X POST <API_URL>/api/audit \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "CREATE TABLE users (id INTEGER, email TEXT);",
    "queries": [
      "SELECT * FROM users;",
      "SELECT * FROM users WHERE LOWER(email) = '\''[email protected]'\'';"
    ],
    "dialect": "postgres"
  }'

Explain Single Query

curl -X POST <API_URL>/api/explain \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "CREATE TABLE users (id INTEGER, email TEXT);",
    "query": "SELECT * FROM users WHERE LOWER(email) = '\''[email protected]'\'';",
    "dialect": "postgres"
  }'

Jupyter Notebook

cd notebooks
jupyter notebook 01_prototype.ipynb

Detected Issues

The rules engine detects the following issues:

Code Rule Severity Description Example
R001 SELECT_STAR warn Avoid SELECT * in production SELECT * FROM users;
R002 UNUSED_JOIN warn Joins where columns aren't referenced SELECT u.id FROM users u JOIN orders o ON ...
R003 CARTESIAN_JOIN error Joins without ON predicate SELECT * FROM users, orders;
R004 NON_SARGABLE warn Functions on indexed columns in WHERE WHERE LOWER(email) = 'test'
R005 MISSING_PREDICATE warn Large table scans without WHERE SELECT * FROM orders; (100k+ rows)
R006 ORDER_BY_NO_INDEX info ORDER BY columns lacking index ORDER BY created_at (no index)
R007 DISTINCT_MISUSE info DISTINCT as de-dupe band-aid SELECT DISTINCT ... with joins
R008 N_PLUS_ONE_PATTERN warn Correlated subqueries WHERE EXISTS (SELECT ... WHERE t.id = outer.id)
R009 LIKE_PREFIX_WILDCARD warn LIKE with leading wildcard WHERE name LIKE '%value'
R010 AGG_NO_GROUPING_INDEX info Aggregations missing covering index GROUP BY category (no index)

Example Output

Before & After Query Optimization

Original Query:

SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE LOWER(u.email) = '[email protected]'
ORDER BY o.created_at DESC;

Detected Issues:

  • R001: SELECT * usage
  • R004: Non-SARGable predicate (LOWER function)
  • R006: ORDER BY without supporting index

Optimized Query:

SELECT o.id, o.user_id, o.created_at, o.total_cents,
       u.id, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = '[email protected]'  -- Removed LOWER()
ORDER BY o.created_at DESC;

Recommended Index:

CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_users_email ON users(email);

API Response Example

{
  "summary": {
    "totalIssues": 5,
    "highSeverity": 1,
    "estImprovement": "2-4x potential speedup - Issues: Full scan on large table 'orders' (100000 rows), Non-SARGable function in WHERE clause"
  },
  "issues": [
    {
      "code": "R001",
      "severity": "warn",
      "message": "Avoid SELECT * in production queries...",
      "rule": "SELECT_STAR"
    }
  ],
  "rewrites": [
    {
      "original": "SELECT * FROM orders...",
      "optimized": "SELECT o.id, o.user_id FROM orders o...",
      "rationale": "Project only needed columns..."
    }
  ],
  "indexes": [
    {
      "table": "orders",
      "columns": ["user_id", "created_at"],
      "type": "btree",
      "rationale": "Supports ORDER BY and join"
    }
  ],
  "llmExplain": "Your query scans entire orders due to SELECT *..."
}

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=backend --cov-report=html

# Run specific test file
pytest backend/tests/test_rules.py

Project Structure

sql-auditor/
├── backend/
│   ├── app.py                 # FastAPI entry point
│   ├── core/
│   │   ├── config.py          # Configuration
│   │   ├── models.py          # Pydantic models
│   │   └── dialects.py        # SQL dialect handling
│   ├── services/
│   │   ├── analyzer/
│   │   │   ├── parser.py      # SQL parsing
│   │   │   ├── rules_engine.py # 10 detection rules
│   │   │   ├── cost_estimator.py
│   │   │   └── index_advisor.py
│   │   ├── llm/
│   │   │   ├── provider.py    # OpenAI integration
│   │   │   └── prompts.py     # LLM prompts
│   │   └── pipeline.py        # Main orchestration
│   ├── db/
│   │   ├── seed.sql           # Demo schema
│   │   └── explain_helpers.py
│   └── tests/                 # Test suite
├── frontend/
│   ├── src/
│   │   ├── App.tsx
│   │   ├── api.ts
│   │   └── components/
│   └── package.json
├── notebooks/
│   └── 01_prototype.ipynb
├── scripts/
│   ├── dev_up.sh
│   └── seed_demo.sh
├── docker-compose.yml
├── Dockerfile
└── README.md

Configuration

Environment variables (see .env.example):

  • OPENAI_API_KEY: OpenAI API key for LLM features (optional)
  • SQLAUDITOR_LOG_LEVEL: Logging level (default: INFO)
  • SQLAUDITOR_DEMO_DB: Path to demo SQLite database
  • DEFAULT_DIALECT: Default SQL dialect (postgres/sqlite)

Roadmap

  • Real EXPLAIN integration with live database connections
  • Multi-dialect support (PostgreSQL, SQLite)
  • Comprehensive monitoring and alerting
  • LLM cost tracking and budgeting
  • User Authentication (JWT/OAuth)
  • Audit History Persistence (PostgreSQL)
  • GitHub Actions CI/CD integration
  • Slack/Teams integration for alerts

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass and coverage ≥ 90%
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

About

SaaS-style SQL auditor for B2B data teams: analyzes schema + queries using LLM + rules engine to audit, optimize, and explain SQL in natural language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors