feat(optimize_spell): skip OPTIMIZE on incremental no-op runs by a-monteiro · Pull Request #9679 · duneanalytics/spellbook

a-monteiro · 2026-05-17T22:33:26Z

Skip the optimize_spell post-hook on incremental models when the just-run main statement wrote fewer than OPTIMIZE_MIN_ROWS rows (default 1000). Table materialization keeps the existing always-optimize behavior.

About one in four incremental cadence runs MERGE zero rows (no new source data, no late-arriving rows in the lookback window), and the ALTER TABLE EXECUTE optimize post-hook still runs unconditionally on each. Across 5 representative cadence runs sampled in the last 7 days (Hourly, Tokens, Daily, DEX, Solana — 5,238 model executions, 55.3h of dbt execute time), optimize_spell accounts for an estimated ~30% of total dbt execute time and ~12% is burned on zero-row models alone.

The worst outliers are stark: chainlink_bnb_fm_reward_evt_transfer_daily, chainlink_bnb_ocr_reward_evt_transfer_daily, and chainlink_bnb_ccip_transmitted_reverted each burned 1000+ seconds of Trino time per Daily run while MERGE-ing zero rows. Six similar chainlink models cost ~50 minutes per Daily cycle between them.

Estimated savings per cadence cycle by threshold:

Threshold	Models skipped	Time saved	% of total
`1` (only 0-row)	1,236	5.0h	9.2%
`100`	2,335	8.7h	16.0%
`1000` (default)	2,932	11.1h	20.5%
`10000`	3,635	12.8h	23.6%

Default of 1000 was chosen to keep most of the win while leaving headroom for models that genuinely need compaction on small-but-real writes. Adjust globally via --vars '{OPTIMIZE_MIN_ROWS: N}' or per-model via optimize_min_rows.

The load_result('main').response.rows_affected read works for all incremental strategies in dbt-trino 1.10 (merge, delete+insert, append, microbatch). For runs where the main statement didn't produce a response (defensive case), we treat rows_affected as 0 and skip.

Testing affordance

Setting --vars '{OPTIMIZE_SPELL_FORCE: true}' enables the macro on non-prod targets, so the conditional logic can be exercised against a personal schema without spoofing the target name. Default is false (behavior unchanged on dev/ci targets). Used by the test sequence below.

Local test recipe (incremental only)

Probe model (in models/_test/optimize_spell_probe.sql, untracked):

{{ config(materialized='incremental', file_format='delta',
          incremental_strategy='merge', unique_key=['id'],
          schema='_test_optimize_spell', alias='probe',
          tags=['optimize_test']) }}
select cast(n as bigint) as id, cast('{{ run_started_at }}' as timestamp) as _updated_at
from unnest(sequence(1, cast({{ var('test_rows', 0) }} as bigint))) as t(n)

Sequence:

uv run dbt run --select tag:optimize_test --full-refresh \
  --vars '{test_rows: 2000, OPTIMIZE_SPELL_FORCE: true}'   # bootstrap, OPTIMIZE expected
uv run dbt run --select tag:optimize_test \
  --vars '{test_rows: 0,    OPTIMIZE_SPELL_FORCE: true}'   # SKIP (0 rows)
uv run dbt run --select tag:optimize_test \
  --vars '{test_rows: 500,  OPTIMIZE_SPELL_FORCE: true}'   # SKIP (<1000)
uv run dbt run --select tag:optimize_test \
  --vars '{test_rows: 2000, OPTIMIZE_SPELL_FORCE: true}'   # RUN (>=1000)

Follow-ups (out of scope)

Audit the chainlink_bnb / chainlink_polygon outliers (the worst 6 models contributing ~5h/week of waste even at threshold=0)
Conditional OPTIMIZE for table materialization
Scheduled weekly OPTIMIZE pass for cold tables
merge_skip_unchanged=true as default (stacks additively with this change)

Rollback is trivial: revert this file.

cursor · 2026-05-17T22:33:34Z

PR Summary

Low Risk
Low risk: only changes a performance post-hook and is limited to prod incremental runs, with a conservative default and per-run/per-model overrides.

Overview
Updates optimize_spell to conditionally run Delta OPTIMIZE for prod incremental models based on load_result('main').response.rows_affected, skipping compaction when writes are below a configurable threshold.

Adds an OPTIMIZE_MIN_ROWS default (1000) with per-model override via config.optimize_min_rows, while keeping the existing always-optimize behavior for table materializations and continuing to no-op outside prod.

^{Reviewed by Cursor Bugbot for commit 439ba7d. Configure here.}

a-monteiro · 2026-05-17T22:33:40Z

feat(optimize_spell): skip OPTIMIZE on incremental no-op runs #9679 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

jeff-dude

we can coordinate deploy when we're both ready

Make optimize_spell conditional on rows_affected for incremental models: skip ALTER TABLE EXECUTE optimize when the just-run main statement wrote fewer than OPTIMIZE_MIN_ROWS rows (default 1000). Measurement across 5 representative cadence runs (Hourly, Tokens, Daily, DEX, Solana, last 7 days) showed ~25% of incremental model runs MERGE zero rows, with the OPTIMIZE post-hook accounting for an estimated ~30% of total dbt execute time. Several chainlink_bnb and chainlink_polygon models burn 700-1000+ seconds per Daily run with 0 rows merged. The threshold is overridable globally via --vars '{OPTIMIZE_MIN_ROWS: N}' or per-model via the optimize_min_rows config. Table materialization keeps the existing always-optimize behavior. Estimated savings (per cadence cycle): 5h at threshold=1, 11h at threshold=1000, 13h at threshold=10000. Conservative go-live default of 1000 captures most of the win with minimal risk.

a-monteiro · 2026-05-18T19:34:26Z

Local verification

Ran the test recipe from the PR body against spellbook-sandpit. Threshold default OPTIMIZE_MIN_ROWS=1000.

Run	Vars	Main statement	Post-hook
1	`--full-refresh test_rows=2000` (CTAS path, always-optimize)	`CREATE TABLE (2_000 rows)` in 3.38s	OPTIMIZE ran (table branch unchanged)
2	`test_rows=0` (expect SKIP)	`MERGE (2 rows)` in 3.77s	OPTIMIZE skipped
3	`test_rows=500` (under 1000 threshold, expect SKIP)	`MERGE (500 rows)` in 4.19s	OPTIMIZE skipped
4	`test_rows=2000` (over 1000 threshold, expect RUN)	`MERGE (2_000 rows)` in 4.63s	OPTIMIZE ran

Post-hook SQL emitted (verbatim from --log-level debug output):

=== Run 4 ===
ALTER TABLE hive.test_schema.<probe> EXECUTE optimize

=== Runs 2 and 3 ===
(no ALTER TABLE line — post-hook produced empty SQL, dbt skipped statement)

Quirk noticed: Trino's sequence(1, 0) returns a 2-element array [1, 0] rather than empty, so test_rows=0 produced 2 rows merged instead of 0. Doesn't change the conclusion (2 < 1000 → still skipped), but worth noting for anyone reproducing the test. To get a true 0-row test, use case when {{ var('test_rows') }} > 0 then sequence(1, ...) end or similar.

The conditional logic works as designed across all four scenarios.

github-actions Bot marked this pull request as draft May 17, 2026 22:33

github-actions Bot added the WIP work in progress label May 17, 2026

jeff-dude marked this pull request as ready for review May 18, 2026 10:57

github-actions Bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels May 18, 2026

jeff-dude approved these changes May 18, 2026

View reviewed changes

a-monteiro force-pushed the andre/optimize-spell-conditional branch from 54edb0f to 048efce Compare May 18, 2026 19:08

Merge branch 'main' into andre/optimize-spell-conditional

336d104

jeff-dude self-assigned this May 18, 2026

jeff-dude added ready-for-merging and removed ready-for-review this PR development is complete, please review labels May 18, 2026

jeff-dude mentioned this pull request May 19, 2026

feat(solana): add executing_account_prefix predicates for instruction_calls partition pruning #9674

Merged

jeff-dude merged commit 4d05c57 into main May 19, 2026
6 checks passed

jeff-dude deleted the andre/optimize-spell-conditional branch May 19, 2026 17:09

github-actions Bot locked and limited conversation to collaborators May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimize_spell): skip OPTIMIZE on incremental no-op runs#9679

feat(optimize_spell): skip OPTIMIZE on incremental no-op runs#9679
jeff-dude merged 2 commits into
mainfrom
andre/optimize-spell-conditional

a-monteiro commented May 17, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 17, 2026 •

edited

Loading

Uh oh!

a-monteiro commented May 17, 2026

Uh oh!

jeff-dude left a comment

Uh oh!

a-monteiro commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a-monteiro commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing affordance

Local test recipe (incremental only)

Follow-ups (out of scope)

Uh oh!

cursor Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

a-monteiro commented May 17, 2026

Uh oh!

jeff-dude left a comment

Choose a reason for hiding this comment

Uh oh!

a-monteiro commented May 18, 2026

Local verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a-monteiro commented May 17, 2026 •

edited

Loading

cursor Bot commented May 17, 2026 •

edited

Loading