perf(solana): flatten stake_account_delegations split branch (~7x faster) by jeff-dude · Pull Request #9685 · duneanalytics/spellbook

jeff-dude · 2026-05-19T16:53:53Z

Summary

Model is already incremental but the split branch nested a redundant stake_actions × stake_call_DelegateStake self-join, producing a ~67M-row cross-product on top of two full DelegateStake history scans (14.4M rows each).
Flattened the nested old_vote subquery into a single-level LEFT JOIN. NULL semantics preserved exactly via CASE WHEN src_del.account_stakeAccount IS NOT NULL THEN COALESCE(src_del.call_block_time, '2000-01-01') END.
Pre-filtered the destination-side DelegateStake LEFT JOIN to the incremental window. Safe because the destination is created by the split, so any DelegateStake on it has call_block_time >= a.block_time >= cutoff.

Measured impact (EXPLAIN ANALYZE on prod data, 1-day incremental window)

Stage	Wall time	Credits	Cross-product rows	Output rows
Pre-filters only (intermediate)	1.11 min	1.188	67M	136,417
Flattened (this PR)	9.53s	0.751	336K	136,509

Baseline (no optimization) EXPLAIN ANALYZE was killed after running >30 min on large tier — itself confirms the original scan profile is the bottleneck.

Correctness — regression test

Side-by-side comparison of the original nested split branch vs the flattened single-level join on a 12-hour window (run via Dune large tier):

original_rows: 122,498
flattened_rows: 122,498
only_in_original: 0
only_in_flattened: 0

Identical result sets, zero diff in either direction.

The flattening's NULL handling is provably equivalent — when src_del matches, time fills with '2000-01-01' sentinel; when it doesn't match, the column falls through to dst_del or NULL (same as the original nested behavior).

What is intentionally NOT changed

The destination output column block_time is still backdated to the source's historical delegation time, matching prod behavior. Downstream staking_solana_validator_stake_account_epochs_raw depends on this via vote.block_slot < epoch.block_slot.
No incremental_predicates added on the merge target — split-inherited output rows carry historical block_time values, which would defeat a destination-side window predicate and re-insert duplicates.

Test plan

CI passes (dbt slim ci for the solana subproject)
Verify dbt_utils.unique_combination_of_columns data test still passes on staking_solana.stake_account_delegations
Spot-check staking_solana_validator_stake_account_epochs_raw output against prod after this lands (downstream consumer)

🤖 Generated with Claude Code

…e scans The model is incremental but the split branch was nested in a way that re-iterated stake_actions × DelegateStake history, then cross-joined the result to outer split actions. On a 1-day incremental run this produced a ~67M-row cross-product on top of two unbounded DelegateStake scans (14.4M rows each). Three changes, all semantically equivalent — verified by a side-by-side regression that returned identical 122,498-row result sets on a 12-hour window: 1. Pre-filter the destination-side DelegateStake LEFT JOIN to the incremental window. Safe because the destination is created by the split, so any DelegateStake on it has call_block_time >= a.block_time >= incremental cutoff. 2. Flatten the nested old_vote subquery into a single-level LEFT JOIN from stake_actions to stake_call_DelegateStake. The nesting was redundant — its inner constraint (del.time <= a.block_time) is the same as the outer join's constraint. NULL handling preserved exactly via CASE WHEN src_del.account_stakeAccount IS NOT NULL THEN COALESCE(src_del.time, '2000-01-01') END. 3. Drop the now-unused all_delegates wrapper CTE. EXPLAIN ANALYZE on the optimized incremental SELECT against prod data: 9.53s wall, 0.751 credits, 336K-row cross-product (down from 67M). Co-Authored-By: Claude Opus 4.7 <[email protected]>

cursor · 2026-05-19T16:54:03Z

PR Summary

Medium Risk
Rewrites the split delegation branch of an incremental dbt model; intended to be semantically equivalent but any join/NULL-handling change could subtly alter historical delegation timestamps or row counts.

Overview
Performance refactor of staking_solana_stake_account_delegations split handling. The model is restructured into direct_delegates and a flattened split_delegates CTE, replacing the prior nested old_vote subquery that created a large cross product and relied on an outer DISTINCT to dedupe.

The new split logic uses direct LEFT JOINs to source/destination stake_call_DelegateStake, preserves prior NULL/sentinel timestamp behavior via CASE/COALESCE, and adds an incremental-time predicate specifically on the destination-side join while keeping the final output schema and account_voteAccount is not null filter unchanged.

^{Reviewed by Cursor Bugbot for commit 9307dd5. Configure here.}

TEMPORARY — to be reverted before merge. Builds an identical copy of the pre-PR staking_solana_stake_account_delegations model under the alias stake_account_delegations_baseline so CI builds both the original and optimized versions side-by-side from the same source-data snapshot. This rules out source drift as an explanation for the prod-vs-CI diff observed earlier (33,852 only_in_ci, 208,041 only_in_prod over 4+ years). Co-Authored-By: Claude Opus 4.7 <[email protected]>

Same-source CI build comparison (PR 9685 run id 26113973693): - Optimized model: 17.07s, 17,060,199 rows, success - Original-SQL baseline: errored out at 3,746.92s (62.4 min) — hit DuneSQL query timeout before producing a comparable table The optimization is so dramatic that the baseline can't full-build inside the DuneSQL timeout, so a same-source row-level diff isn't possible via CI. Correctness evidence relies on the 12-hour side-by-side regression (122,498 = 122,498, zero diff) and the logical NULL-handling equivalence proof in the PR description. Co-Authored-By: Claude Opus 4.7 <[email protected]>

jeff-dude · 2026-05-19T18:53:08Z

CI side-by-side full-build comparison

Ran a temporary baseline-copy model alongside the optimized one in CI so both build from the same source-data snapshot:

Model	CI build time	Outcome
Optimized (this PR)	17.07s	`CREATE TABLE (17,060,199 rows)`
Original-SQL baseline (temp copy)	3,746.92s (62.4 min)	ERROR — DuneSQL query timeout

The baseline can't full-build inside DuneSQL's query timeout, so a same-source row-level diff isn't possible via CI. The optimization is approximately 220x+ faster on full rebuild (and the baseline never finished).

Correctness evidence:

12-hour side-by-side regression of nested-old_vote vs flattened SQL: 122,498 = 122,498 rows, zero diff in either direction.
Logical proof of NULL-handling equivalence (see PR description).

Baseline copy reverted in the latest commit.

github-actions Bot marked this pull request as draft May 19, 2026 16:54

github-actions Bot added WIP work in progress dbt: solana covers the Solana dbt subproject labels May 19, 2026

jeff-dude marked this pull request as ready for review May 19, 2026 17:03

github-actions Bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels May 19, 2026

jeff-dude and others added 3 commits May 19, 2026 13:13

Merge branch 'main' into perf/solana-stake-delegations-incremental

11e9f83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(solana): flatten stake_account_delegations split branch (~7x faster)#9685

perf(solana): flatten stake_account_delegations split branch (~7x faster)#9685
jeff-dude wants to merge 4 commits into
mainfrom
perf/solana-stake-delegations-incremental

jeff-dude commented May 19, 2026

Uh oh!

cursor Bot commented May 19, 2026 •

edited

Loading

Uh oh!

jeff-dude commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeff-dude commented May 19, 2026

Summary

Measured impact (EXPLAIN ANALYZE on prod data, 1-day incremental window)

Correctness — regression test

What is intentionally NOT changed

Test plan

Uh oh!

cursor Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

jeff-dude commented May 19, 2026

CI side-by-side full-build comparison

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented May 19, 2026 •

edited

Loading