The zero-dependency regression testing tool for modern data warehouses.
Warning
Documentation: github.io/quack-diff
- DuckDB-First Architecture: Uses DuckDB as a universal adapter for database connectivity
- Dialect-Safe Hashing: Handles NULL values and type mismatches correctly across databases
- Time-Travel Support: Compare data against historical snapshots (Snowflake, Delta Lake)
- CI/CD Ready: Exit codes for pipeline integration, environment variable configuration
- User-Friendly: Rich terminal output with beautiful diff tables
uv add quack-diff# Compare two tables
quack-diff compare --source db1.users --target db2.users --key id
# Compare with threshold (allow up to 1% difference)
quack-diff compare --source prod.orders --target dev.orders --key order_id --threshold 0.01# Compare current data with 5 minutes ago
quack-diff compare \
--source snowflake.orders \
--target snowflake.orders \
--source-at "5 minutes ago" \
--key order_idquack-diff schema --source db1.users --target db2.usersValidate that pipeline layers have the same number of rows or distinct keys (no row loss):
# Same row count across layers
quack-diff count -t bronze.orders -t silver.orders -t gold.orders
# Same distinct ID count
quack-diff count -t bronze.orders -t silver.orders -t gold.orders --key order_id
# Per-table GROUP BY (count distinct groups in second table)
quack-diff count \
-t sf.GOLD.FCT_INVOICE \
-t "sf.RAW.INVOICE_LINES[salesid,linenum,tariffcode,linestartdate]"
# Add SUM validation alongside row counts
quack-diff count \
-t sf.GOLD.FCT_INVOICE \
-t "sf.RAW.INVOICE_LINES[salesid,linenum]" \
--sum-column QUANTITY --sum-column qty
# JSON for CI/CD
quack-diff count -t bronze.orders -t silver.orders -t gold.orders --key order_id --jsonThe output shows a per-metric status (Count, Sum) plus a global status:
╭──────────────────────── Count Summary ────────────────────────╮
│ Table Count Sum │
│ gold.fct_invoice 277,583 252,068,690.39 │
│ raw.invoice_lines 277,583 252,068,690.39 │
│ │
│ Count MATCH MATCH │
│ Status MATCH │
╰───────────────────────────────────────────────────────────────╯
Use --count-threshold and --sum-threshold to allow small differences. Each accepts a percentage (e.g. 5%) or an absolute value (e.g. 100):
# Allow up to 1% difference on counts and 500 absolute on sums
quack-diff count \
-t bronze.orders -t silver.orders \
--sum-column amount \
--count-threshold 1% \
--sum-threshold 500When values differ but fall within the threshold the status shows PASS (within …) instead of MATCH.
quack-diff supports configuration via environment variables:
# Snowflake
export QUACK_DIFF_SNOWFLAKE_ACCOUNT="your-account"
export QUACK_DIFF_SNOWFLAKE_USER="your-user"
export QUACK_DIFF_SNOWFLAKE_PASSWORD="your-password"Or via a quack-diff.yaml configuration file:
snowflake:
account: your-account
user: your-user
database: your-database
defaults:
threshold: 0.0quack-diff leverages DuckDB's extension system to connect to external databases:
- Attach: Mount remote databases using DuckDB extensions (snowflake)
- Hash: Generate row-level hashes using dialect-safe SQL (handles NULLs, type coercion)
- Compare: Identify mismatches by comparing hash values
- Report: Display results in beautiful terminal tables
# Clone the repository
git clone https://github.com/matteorenoldi/quack-diff.git
cd quack-diff
# Install with dev dependencies
uv sync
# Install pre-commit hooks
uv run prek install
# Run tests
uv run pytest
# Lint
uv run ruff check .MIT License - see LICENSE for details.
