CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Agent-Based Development Workflow

Every time the user requests a feature or bug fix, you MUST follow the process below:

Development Process

Understanding The Task: Use the developer agent to understand what the user is asking for and to read GitHub issues
Feature Development & Bug Fixes: Use the developer agent for implementing features and fixing bugs. IMPORTANT: Always begin by writing a failing test (or tests) that reflects the expected behavior
Code Review: After development work, invoke the code-reviewer agent to review the implementation
Iteration: Use the developer agent again to address feedback from the code reviewer
Repeat: Continue the developer → code-reviewer cycle until no more feedback remains
Documentation: If the feature or bug fix requires documentation updates, invoke the technical-writer agent

IMPORTANT: Make sure to share the project overview, architecture overview, and other concepts outlined below with the agent when it is invoked.

Agent Responsibilities

Developer Agent:

Understands a feature request or a reported issue
Implements new features following SQLMesh's architecture patterns
Fixes bugs with proper understanding of the codebase
Writes comprehensive tests following SQLMesh's testing conventions
Follows established code style and conventions

Code-Reviewer Agent:

Reviews implementation for quality and architectural compliance
Identifies potential issues, edge cases, and improvements
Ensures adherence to SQLMesh patterns and best practices
Validates test coverage and quality

Technical-Writer Agent:

Creates and updates user-facing documentation
Writes API documentation for new features
Updates existing docs after code changes
Creates migration guides and deep-dive technical explanations

Project Overview

SQLMesh is a next-generation data transformation framework that enables:

Virtual data environments for isolated development without warehouse costs
Plan/apply workflow (like Terraform) for safe deployments
Multi-dialect SQL support with automatic transpilation
Incremental processing to run only necessary transformations
Built-in testing and CI/CD integration

Requirements: Python >= 3.9 (Note: Python 3.13+ is not yet supported)

Essential Commands

Environment setup

# Create and activate a Python virtual environment (Python >= 3.9, < 3.13)
python -m venv .venv
source ./.venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install development dependencies
make install-dev

# Setup pre-commit hooks (important for code quality)
make install-pre-commit

Common Development Tasks

# Run linters and formatters (ALWAYS run before committing)
make style

# Fast tests for quick feedback during development
make fast-test

# Slow tests for comprehensive coverage
make slow-test

# Run specific test file
pytest tests/core/test_context.py -v

# Run tests with specific marker
pytest -m "not slow and not docker" -v

# Build package
make package

# Serve documentation locally
make docs-serve

Engine-Specific Testing

# DuckDB (default, no setup required)
make duckdb-test

# Other engines require credentials/Docker
make snowflake-test  # Needs SNOWFLAKE_* env vars
make bigquery-test   # Needs GOOGLE_APPLICATION_CREDENTIALS
make databricks-test # Needs DATABRICKS_* env vars

UI Development

# In web/client directory
pnpm run dev   # Start development server
pnpm run build # Production build
pnpm run test  # Run tests

# Docker-based UI
make ui-up     # Start UI in Docker
make ui-down   # Stop UI

Architecture Overview

Core Components

sqlmesh/core/context.py: The main Context class orchestrates all SQLMesh operations. This is the entry point for understanding how models are loaded, plans are created, and executions happen.

sqlmesh/core/model/: Model definitions and kinds (FULL, INCREMENTAL_BY_TIME_RANGE, SCD_TYPE_2, etc.). Each model kind has specific behaviors for how data is processed.

sqlmesh/core/snapshot/: The versioning system. Snapshots are immutable versions of models identified by fingerprints. Understanding snapshots is crucial for how SQLMesh tracks changes.

sqlmesh/core/plan/: Plan building and evaluation logic. Plans determine what changes need to be applied and in what order.

sqlmesh/core/engine_adapter/: Database engine adapters provide a unified interface across 16+ SQL engines. Each adapter handles engine-specific SQL generation and execution.

Key Concepts

Virtual Environments: Lightweight branches that share unchanged data between environments, reducing storage costs and deployment time.
Fingerprinting: Models are versioned using content-based fingerprints. Any change to a model's logic creates a new version.
State Sync: Manages metadata across different backends (can be stored in the data warehouse or external databases).
Intervals: Time-based partitioning system for incremental models, tracking what data has been processed.

Important Files

sqlmesh/core/context.py: Main orchestration class
examples/sushi/: Reference implementation used in tests
web/server/main.py: Web UI backend entry point
web/client/src/App.tsx: Web UI frontend entry point
vscode/extension/src/extension.ts: VSCode extension entry point

GitHub CI/CD Bot Architecture

SQLMesh includes a GitHub CI/CD bot integration that automates data transformation workflows. The implementation is located in sqlmesh/integrations/github/ and follows a clean architectural pattern.

Code Organization

Core Integration Files:

sqlmesh/cicd/bot.py: Main CLI entry point (sqlmesh_cicd command)
sqlmesh/integrations/github/cicd/controller.py: Core bot orchestration logic
sqlmesh/integrations/github/cicd/command.py: Individual command implementations
sqlmesh/integrations/github/cicd/config.py: Configuration classes and validation

Architecture Pattern

The bot follows a Command Pattern architecture:

CLI Layer (bot.py): Handles argument parsing and delegates to controllers
Controller Layer (controller.py): Orchestrates workflow execution and manages state
Command Layer (command.py): Implements individual operations (test, deploy, plan, etc.)
Configuration Layer (config.py): Manages bot configuration and validation

Key Components

GitHubCICDController: Main orchestrator that:

Manages GitHub API interactions via PyGithub
Coordinates workflow execution across different commands
Handles error reporting through GitHub Check Runs
Manages PR comment interactions and status updates

Command Implementations:

run_tests(): Executes unit tests with detailed reporting
update_pr_environment(): Creates/updates virtual PR environments
gen_prod_plan(): Generates production deployment plans
deploy_production(): Handles production deployments
check_required_approvers(): Validates approval requirements

Configuration Management:

Uses Pydantic models for type-safe configuration
Supports both YAML config files and environment variables
Validates bot settings and user permissions
Handles approval workflows and deployment triggers

Integration with Core SQLMesh

The bot leverages core SQLMesh components:

Context: Uses SQLMesh Context for project operations
Plan/Apply: Integrates with SQLMesh's plan generation and application
Virtual Environments: Creates isolated PR environments using SQLMesh's virtual data environments
State Sync: Manages metadata synchronization across environments
Testing Framework: Executes SQLMesh unit tests and reports results

Error Handling and Reporting

GitHub Check Runs: Creates detailed status reports for each workflow step
PR Comments: Provides user-friendly feedback on failures and successes
Structured Logging: Uses SQLMesh's logging framework for debugging
Exception Handling: Graceful handling of GitHub API failures and SQLMesh errors

Environment Variables for Engine Testing

When running engine-specific tests, these environment variables are required:

Snowflake: SNOWFLAKE_ACCOUNT, SNOWFLAKE_WAREHOUSE, SNOWFLAKE_DATABASE, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD
BigQuery: BIGQUERY_KEYFILE or GOOGLE_APPLICATION_CREDENTIALS
Databricks: DATABRICKS_CATALOG, DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_ACCESS_TOKEN, DATABRICKS_CONNECT_VERSION
Redshift: REDSHIFT_HOST, REDSHIFT_USER, REDSHIFT_PASSWORD, REDSHIFT_DATABASE
Athena: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, ATHENA_S3_WAREHOUSE_LOCATION
ClickHouse Cloud: CLICKHOUSE_CLOUD_HOST, CLICKHOUSE_CLOUD_USERNAME, CLICKHOUSE_CLOUD_PASSWORD

Migrations System

SQLMesh uses a migration system to evolve its internal state database schema and metadata format. The migrations handle changes to SQLMesh's internal structure, not user data transformations.

Migration Structure

Location: sqlmesh/migrations/ - Contains 80+ migration files from v0001 to v0083+

Naming Convention: v{XXXX}_{descriptive_name}.py (e.g., v0001_init.py, v0083_use_sql_for_scd_time_data_type_data_hash.py)

Core Infrastructure:

sqlmesh/core/state_sync/db/migrator.py: Main migration orchestrator
sqlmesh/utils/migration.py: Cross-database compatibility utilities
sqlmesh/core/state_sync/base.py: Auto-discovery and loading logic

Migration Categories

Schema Evolution:

State table creation/modification (snapshots, environments, intervals)
Column additions/removals and index management
Database engine compatibility fixes (MySQL/MSSQL field size limits)

Data Format Migrations:

JSON metadata structure updates (snapshot serialization changes)
Path normalization (Windows compatibility)
Fingerprint recalculation when SQLGlot parsing changes

Cleanup Operations:

Removing obsolete tables and unused data
Metadata optimization and attribute cleanup

Key Migration Patterns

# Standard migration function signature
def migrate(state_sync, **kwargs):  # type: ignore
    engine_adapter = state_sync.engine_adapter
    schema = state_sync.schema
    # Migration logic here

# Common operations
engine_adapter.create_state_table(table_name, columns_dict)
engine_adapter.alter_table(alter_expression)
engine_adapter.drop_table(table_name)

State Management Integration

Core State Tables:

_snapshots: Model version metadata (most frequently migrated)
_environments: Environment definitions
_versions: Schema/SQLGlot/SQLMesh version tracking
_intervals: Incremental processing metadata

Migration Safety:

Automatic backups before migration (unless skip_backup=True)
Atomic database transactions for consistency
Snapshot count validation before/after migrations
Automatic rollback on failures

Migration Execution

Auto-Discovery: Migrations are automatically loaded using pkgutil.iter_modules()

Triggers: Migrations run automatically when:

Schema version mismatch detected
SQLGlot version changes require fingerprint recalculation
Manual sqlmesh migrate command execution

Execution Flow:

Version comparison (local vs remote schema)
Backup creation of state tables
Sequential migration execution (numerical order)
Snapshot fingerprint recalculation if needed
Environment updates with new snapshot references

dbt Integration

SQLMesh provides native support for dbt projects, allowing users to run existing dbt projects while gaining access to SQLMesh's advanced features like virtual environments and plan/apply workflows.

Core dbt Integration

Location: sqlmesh/dbt/ - Complete dbt integration architecture

Key Components:

sqlmesh/dbt/loader.py: Main dbt project loader extending SQLMesh's base loader
sqlmesh/dbt/manifest.py: dbt manifest parsing and project discovery
sqlmesh/dbt/adapter.py: dbt adapter system for SQL execution and schema operations
sqlmesh/dbt/model.py: dbt model configurations and materialization mapping
sqlmesh/dbt/context.py: dbt project context and environment management

Project Conversion

dbt Converter: sqlmesh/dbt/converter/ - Tools for migrating dbt projects to SQLMesh

Key Features:

convert.py: Main conversion orchestration
jinja.py & jinja_transforms.py: Jinja template and macro conversion
Full support for dbt assets (models, seeds, sources, tests, snapshots, macros)

CLI Commands:

# Initialize SQLMesh in existing dbt project
sqlmesh init -t dbt

# Convert dbt project to SQLMesh format
sqlmesh dbt convert

Supported dbt Features

Project Structure:

Full dbt project support (models, seeds, sources, tests, snapshots, macros)
dbt package dependencies and version management
Profile integration using existing profiles.yml for connections

Materializations:

All standard dbt materializations (table, view, incremental, ephemeral)
Incremental model strategies (delete+insert, merge, insert_overwrite)
SCD Type 2 support and snapshot strategies

Advanced Features:

Jinja templating with full macro support
Runtime variable passing and configuration
dbt test integration and execution
Cross-database compatibility with SQLMesh's multi-dialect support

Example Projects

sushi_dbt: examples/sushi_dbt/ - Complete dbt project running with SQLMesh Test Fixtures: tests/fixtures/dbt/sushi_test/ - Comprehensive test dbt project with all asset types

Integration Benefits

When using dbt with SQLMesh, you gain:

Virtual Environments: Isolated development without warehouse costs
Plan/Apply Workflow: Safe deployments with change previews
Multi-Dialect Support: Run the same dbt project across different SQL engines
Advanced Testing: Enhanced testing capabilities beyond standard dbt tests
State Management: Sophisticated metadata and versioning system

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Agent-Based Development Workflow

Development Process

Agent Responsibilities

Project Overview

Essential Commands

Environment setup

Common Development Tasks

Engine-Specific Testing

UI Development

Architecture Overview

Core Components

Key Concepts

Important Files

GitHub CI/CD Bot Architecture

Code Organization

Architecture Pattern

Key Components

Integration with Core SQLMesh

Error Handling and Reporting

Environment Variables for Engine Testing

Migrations System

Migration Structure

Migration Categories

Key Migration Patterns

State Management Integration

Migration Execution

dbt Integration

Core dbt Integration

Project Conversion

Supported dbt Features

Example Projects

Integration Benefits

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Agent-Based Development Workflow

Development Process

Agent Responsibilities

Project Overview

Essential Commands

Environment setup

Common Development Tasks

Engine-Specific Testing

UI Development

Architecture Overview

Core Components

Key Concepts

Important Files

GitHub CI/CD Bot Architecture

Code Organization

Architecture Pattern

Key Components

Integration with Core SQLMesh

Error Handling and Reporting

Environment Variables for Engine Testing

Migrations System

Migration Structure

Migration Categories

Key Migration Patterns

State Management Integration

Migration Execution

dbt Integration

Core dbt Integration

Project Conversion

Supported dbt Features

Example Projects

Integration Benefits