This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Every time the user requests a feature or bug fix, you MUST follow the process below:
- Understanding The Task: Use the
developeragent to understand what the user is asking for and to read GitHub issues - Feature Development & Bug Fixes: Use the
developeragent for implementing features and fixing bugs. IMPORTANT: Always begin by writing a failing test (or tests) that reflects the expected behavior - Code Review: After development work, invoke the
code-revieweragent to review the implementation - Iteration: Use the
developeragent again to address feedback from the code reviewer - Repeat: Continue the developer → code-reviewer cycle until no more feedback remains
- Documentation: If the feature or bug fix requires documentation updates, invoke the
technical-writeragent
IMPORTANT: Make sure to share the project overview, architecture overview, and other concepts outlined below with the agent when it is invoked.
Developer Agent:
- Understands a feature request or a reported issue
- Implements new features following SQLMesh's architecture patterns
- Fixes bugs with proper understanding of the codebase
- Writes comprehensive tests following SQLMesh's testing conventions
- Follows established code style and conventions
Code-Reviewer Agent:
- Reviews implementation for quality and architectural compliance
- Identifies potential issues, edge cases, and improvements
- Ensures adherence to SQLMesh patterns and best practices
- Validates test coverage and quality
Technical-Writer Agent:
- Creates and updates user-facing documentation
- Writes API documentation for new features
- Updates existing docs after code changes
- Creates migration guides and deep-dive technical explanations
SQLMesh is a next-generation data transformation framework that enables:
- Virtual data environments for isolated development without warehouse costs
- Plan/apply workflow (like Terraform) for safe deployments
- Multi-dialect SQL support with automatic transpilation
- Incremental processing to run only necessary transformations
- Built-in testing and CI/CD integration
Requirements: Python >= 3.9 (Note: Python 3.13+ is not yet supported)
# Create and activate a Python virtual environment (Python >= 3.9, < 3.13)
python -m venv .venv
source ./.venv/bin/activate # On Windows: .venv\Scripts\activate
# Install development dependencies
make install-dev
# Setup pre-commit hooks (important for code quality)
make install-pre-commit# Run linters and formatters (ALWAYS run before committing)
make style
# Fast tests for quick feedback during development
make fast-test
# Slow tests for comprehensive coverage
make slow-test
# Run specific test file
pytest tests/core/test_context.py -v
# Run tests with specific marker
pytest -m "not slow and not docker" -v
# Build package
make package
# Serve documentation locally
make docs-serve# DuckDB (default, no setup required)
make duckdb-test
# Other engines require credentials/Docker
make snowflake-test # Needs SNOWFLAKE_* env vars
make bigquery-test # Needs GOOGLE_APPLICATION_CREDENTIALS
make databricks-test # Needs DATABRICKS_* env vars# In web/client directory
pnpm run dev # Start development server
pnpm run build # Production build
pnpm run test # Run tests
# Docker-based UI
make ui-up # Start UI in Docker
make ui-down # Stop UIsqlmesh/core/context.py: The main Context class orchestrates all SQLMesh operations. This is the entry point for understanding how models are loaded, plans are created, and executions happen.
sqlmesh/core/model/: Model definitions and kinds (FULL, INCREMENTAL_BY_TIME_RANGE, SCD_TYPE_2, etc.). Each model kind has specific behaviors for how data is processed.
sqlmesh/core/snapshot/: The versioning system. Snapshots are immutable versions of models identified by fingerprints. Understanding snapshots is crucial for how SQLMesh tracks changes.
sqlmesh/core/plan/: Plan building and evaluation logic. Plans determine what changes need to be applied and in what order.
sqlmesh/core/engine_adapter/: Database engine adapters provide a unified interface across 16+ SQL engines. Each adapter handles engine-specific SQL generation and execution.
-
Virtual Environments: Lightweight branches that share unchanged data between environments, reducing storage costs and deployment time.
-
Fingerprinting: Models are versioned using content-based fingerprints. Any change to a model's logic creates a new version.
-
State Sync: Manages metadata across different backends (can be stored in the data warehouse or external databases).
-
Intervals: Time-based partitioning system for incremental models, tracking what data has been processed.
sqlmesh/core/context.py: Main orchestration classexamples/sushi/: Reference implementation used in testsweb/server/main.py: Web UI backend entry pointweb/client/src/App.tsx: Web UI frontend entry pointvscode/extension/src/extension.ts: VSCode extension entry point
SQLMesh includes a GitHub CI/CD bot integration that automates data transformation workflows. The implementation is located in sqlmesh/integrations/github/ and follows a clean architectural pattern.
Core Integration Files:
sqlmesh/cicd/bot.py: Main CLI entry point (sqlmesh_cicdcommand)sqlmesh/integrations/github/cicd/controller.py: Core bot orchestration logicsqlmesh/integrations/github/cicd/command.py: Individual command implementationssqlmesh/integrations/github/cicd/config.py: Configuration classes and validation
The bot follows a Command Pattern architecture:
- CLI Layer (
bot.py): Handles argument parsing and delegates to controllers - Controller Layer (
controller.py): Orchestrates workflow execution and manages state - Command Layer (
command.py): Implements individual operations (test, deploy, plan, etc.) - Configuration Layer (
config.py): Manages bot configuration and validation
GitHubCICDController: Main orchestrator that:
- Manages GitHub API interactions via PyGithub
- Coordinates workflow execution across different commands
- Handles error reporting through GitHub Check Runs
- Manages PR comment interactions and status updates
Command Implementations:
run_tests(): Executes unit tests with detailed reportingupdate_pr_environment(): Creates/updates virtual PR environmentsgen_prod_plan(): Generates production deployment plansdeploy_production(): Handles production deploymentscheck_required_approvers(): Validates approval requirements
Configuration Management:
- Uses Pydantic models for type-safe configuration
- Supports both YAML config files and environment variables
- Validates bot settings and user permissions
- Handles approval workflows and deployment triggers
The bot leverages core SQLMesh components:
- Context: Uses SQLMesh Context for project operations
- Plan/Apply: Integrates with SQLMesh's plan generation and application
- Virtual Environments: Creates isolated PR environments using SQLMesh's virtual data environments
- State Sync: Manages metadata synchronization across environments
- Testing Framework: Executes SQLMesh unit tests and reports results
- GitHub Check Runs: Creates detailed status reports for each workflow step
- PR Comments: Provides user-friendly feedback on failures and successes
- Structured Logging: Uses SQLMesh's logging framework for debugging
- Exception Handling: Graceful handling of GitHub API failures and SQLMesh errors
When running engine-specific tests, these environment variables are required:
- Snowflake:
SNOWFLAKE_ACCOUNT,SNOWFLAKE_WAREHOUSE,SNOWFLAKE_DATABASE,SNOWFLAKE_USER,SNOWFLAKE_PASSWORD - BigQuery:
BIGQUERY_KEYFILEorGOOGLE_APPLICATION_CREDENTIALS - Databricks:
DATABRICKS_CATALOG,DATABRICKS_SERVER_HOSTNAME,DATABRICKS_HTTP_PATH,DATABRICKS_ACCESS_TOKEN,DATABRICKS_CONNECT_VERSION - Redshift:
REDSHIFT_HOST,REDSHIFT_USER,REDSHIFT_PASSWORD,REDSHIFT_DATABASE - Athena:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,ATHENA_S3_WAREHOUSE_LOCATION - ClickHouse Cloud:
CLICKHOUSE_CLOUD_HOST,CLICKHOUSE_CLOUD_USERNAME,CLICKHOUSE_CLOUD_PASSWORD
SQLMesh uses a migration system to evolve its internal state database schema and metadata format. The migrations handle changes to SQLMesh's internal structure, not user data transformations.
Location: sqlmesh/migrations/ - Contains 80+ migration files from v0001 to v0083+
Naming Convention: v{XXXX}_{descriptive_name}.py (e.g., v0001_init.py, v0083_use_sql_for_scd_time_data_type_data_hash.py)
Core Infrastructure:
sqlmesh/core/state_sync/db/migrator.py: Main migration orchestratorsqlmesh/utils/migration.py: Cross-database compatibility utilitiessqlmesh/core/state_sync/base.py: Auto-discovery and loading logic
Schema Evolution:
- State table creation/modification (snapshots, environments, intervals)
- Column additions/removals and index management
- Database engine compatibility fixes (MySQL/MSSQL field size limits)
Data Format Migrations:
- JSON metadata structure updates (snapshot serialization changes)
- Path normalization (Windows compatibility)
- Fingerprint recalculation when SQLGlot parsing changes
Cleanup Operations:
- Removing obsolete tables and unused data
- Metadata optimization and attribute cleanup
# Standard migration function signature
def migrate(state_sync, **kwargs): # type: ignore
engine_adapter = state_sync.engine_adapter
schema = state_sync.schema
# Migration logic here
# Common operations
engine_adapter.create_state_table(table_name, columns_dict)
engine_adapter.alter_table(alter_expression)
engine_adapter.drop_table(table_name)Core State Tables:
_snapshots: Model version metadata (most frequently migrated)_environments: Environment definitions_versions: Schema/SQLGlot/SQLMesh version tracking_intervals: Incremental processing metadata
Migration Safety:
- Automatic backups before migration (unless
skip_backup=True) - Atomic database transactions for consistency
- Snapshot count validation before/after migrations
- Automatic rollback on failures
Auto-Discovery: Migrations are automatically loaded using pkgutil.iter_modules()
Triggers: Migrations run automatically when:
- Schema version mismatch detected
- SQLGlot version changes require fingerprint recalculation
- Manual
sqlmesh migratecommand execution
Execution Flow:
- Version comparison (local vs remote schema)
- Backup creation of state tables
- Sequential migration execution (numerical order)
- Snapshot fingerprint recalculation if needed
- Environment updates with new snapshot references
SQLMesh provides native support for dbt projects, allowing users to run existing dbt projects while gaining access to SQLMesh's advanced features like virtual environments and plan/apply workflows.
Location: sqlmesh/dbt/ - Complete dbt integration architecture
Key Components:
sqlmesh/dbt/loader.py: Main dbt project loader extending SQLMesh's base loadersqlmesh/dbt/manifest.py: dbt manifest parsing and project discoverysqlmesh/dbt/adapter.py: dbt adapter system for SQL execution and schema operationssqlmesh/dbt/model.py: dbt model configurations and materialization mappingsqlmesh/dbt/context.py: dbt project context and environment management
dbt Converter: sqlmesh/dbt/converter/ - Tools for migrating dbt projects to SQLMesh
Key Features:
convert.py: Main conversion orchestrationjinja.py&jinja_transforms.py: Jinja template and macro conversion- Full support for dbt assets (models, seeds, sources, tests, snapshots, macros)
CLI Commands:
# Initialize SQLMesh in existing dbt project
sqlmesh init -t dbt
# Convert dbt project to SQLMesh format
sqlmesh dbt convertProject Structure:
- Full dbt project support (models, seeds, sources, tests, snapshots, macros)
- dbt package dependencies and version management
- Profile integration using existing
profiles.ymlfor connections
Materializations:
- All standard dbt materializations (table, view, incremental, ephemeral)
- Incremental model strategies (delete+insert, merge, insert_overwrite)
- SCD Type 2 support and snapshot strategies
Advanced Features:
- Jinja templating with full macro support
- Runtime variable passing and configuration
- dbt test integration and execution
- Cross-database compatibility with SQLMesh's multi-dialect support
sushi_dbt: examples/sushi_dbt/ - Complete dbt project running with SQLMesh
Test Fixtures: tests/fixtures/dbt/sushi_test/ - Comprehensive test dbt project with all asset types
When using dbt with SQLMesh, you gain:
- Virtual Environments: Isolated development without warehouse costs
- Plan/Apply Workflow: Safe deployments with change previews
- Multi-Dialect Support: Run the same dbt project across different SQL engines
- Advanced Testing: Enhanced testing capabilities beyond standard dbt tests
- State Management: Sophisticated metadata and versioning system