feat(optimize_spell): skip OPTIMIZE on incremental no-op runs#9679
Conversation
PR SummaryLow Risk Overview Adds an Reviewed by Cursor Bugbot for commit 439ba7d. Configure here. |
jeff-dude
left a comment
There was a problem hiding this comment.
we can coordinate deploy when we're both ready
Make optimize_spell conditional on rows_affected for incremental
models: skip ALTER TABLE EXECUTE optimize when the just-run main
statement wrote fewer than OPTIMIZE_MIN_ROWS rows (default 1000).
Measurement across 5 representative cadence runs (Hourly, Tokens,
Daily, DEX, Solana, last 7 days) showed ~25% of incremental model
runs MERGE zero rows, with the OPTIMIZE post-hook accounting for an
estimated ~30% of total dbt execute time. Several chainlink_bnb and
chainlink_polygon models burn 700-1000+ seconds per Daily run with
0 rows merged.
The threshold is overridable globally via --vars '{OPTIMIZE_MIN_ROWS: N}'
or per-model via the optimize_min_rows config. Table materialization
keeps the existing always-optimize behavior.
Estimated savings (per cadence cycle): 5h at threshold=1, 11h at
threshold=1000, 13h at threshold=10000. Conservative go-live default
of 1000 captures most of the win with minimal risk.
54edb0f to
048efce
Compare
Local verificationRan the test recipe from the PR body against spellbook-sandpit. Threshold default
Post-hook SQL emitted (verbatim from Quirk noticed: Trino's The conditional logic works as designed across all four scenarios. |

Skip the
optimize_spellpost-hook on incremental models when the just-run main statement wrote fewer thanOPTIMIZE_MIN_ROWSrows (default 1000). Table materialization keeps the existing always-optimize behavior.About one in four incremental cadence runs MERGE zero rows (no new source data, no late-arriving rows in the lookback window), and the
ALTER TABLE EXECUTE optimizepost-hook still runs unconditionally on each. Across 5 representative cadence runs sampled in the last 7 days (Hourly, Tokens, Daily, DEX, Solana — 5,238 model executions, 55.3h of dbt execute time),optimize_spellaccounts for an estimated ~30% of total dbt execute time and ~12% is burned on zero-row models alone.The worst outliers are stark:
chainlink_bnb_fm_reward_evt_transfer_daily,chainlink_bnb_ocr_reward_evt_transfer_daily, andchainlink_bnb_ccip_transmitted_revertedeach burned 1000+ seconds of Trino time per Daily run while MERGE-ing zero rows. Six similar chainlink models cost ~50 minutes per Daily cycle between them.Estimated savings per cadence cycle by threshold:
1(only 0-row)1001000(default)10000Default of 1000 was chosen to keep most of the win while leaving headroom for models that genuinely need compaction on small-but-real writes. Adjust globally via
--vars '{OPTIMIZE_MIN_ROWS: N}'or per-model viaoptimize_min_rows.The
load_result('main').response.rows_affectedread works for all incremental strategies in dbt-trino 1.10 (merge, delete+insert, append, microbatch). For runs where the main statement didn't produce a response (defensive case), we treat rows_affected as 0 and skip.Testing affordance
Setting
--vars '{OPTIMIZE_SPELL_FORCE: true}'enables the macro on non-prod targets, so the conditional logic can be exercised against a personal schema without spoofing the target name. Default is false (behavior unchanged on dev/ci targets). Used by the test sequence below.Local test recipe (incremental only)
Probe model (in
models/_test/optimize_spell_probe.sql, untracked):{{ config(materialized='incremental', file_format='delta', incremental_strategy='merge', unique_key=['id'], schema='_test_optimize_spell', alias='probe', tags=['optimize_test']) }} select cast(n as bigint) as id, cast('{{ run_started_at }}' as timestamp) as _updated_at from unnest(sequence(1, cast({{ var('test_rows', 0) }} as bigint))) as t(n)Sequence:
Follow-ups (out of scope)
tablematerializationmerge_skip_unchanged=trueas default (stacks additively with this change)Rollback is trivial: revert this file.