Skip to content

on-run-end upload_dbt_artifacts hangs with dbt Fusion on large projects #978

@pholterman

Description

@pholterman

Description

The on-run-end hook hangs for 8+ minutes when using dbt Fusion (v2.0.0-preview.149) with Elementary v0.23.0 on a large project (~2,600 models). The same project completes the on-run-end in a reasonable time with dbt Core 1.11.6.

Environment

  • Elementary version: 0.23.0
  • dbt Fusion version: 2.0.0-preview.149
  • dbt Core version (where it works): 1.11.6
  • Adapter: Snowflake
  • Project size: ~2,671 SQL models, ~1,975 YAML schema files

Steps to Reproduce

  1. Have a large dbt project (~2,600+ models)
  2. Run a small subset of models with dbt Fusion:
    dbt run -s path:models/domain/,tag:some_tag,config.materialized:table
    
  3. Observe the on-run-end phase hanging:
    Succeeded [  2.88s] model DBT_domain.dmn_customer_service__emails (table)
    Succeeded [  3.04s] model DBT_domain.dmn_customer_service__combined_communications (table)
    Succeeded [  7.05s] model DBT_domain.dmn_sales__emails (table)
    Failed    [  2.00s] model DBT_domain.dmn_customer_service__combined_communications_with_topics (table)
    Running on-run-end ⠄ [8m]
    

Root Cause Analysis

The bottleneck is upload_dbt_artifacts(), which iterates over the entire graph (all models, tests, sources, columns) regardless of how many models were actually executed. Each node goes through a flatten_* macro that performs ~20 .get() / .update() / list operations.

In dbt Core (Jinja2), these are native Python dict operations — fast and zero-overhead.

In dbt Fusion (MiniJinja), each dict access crosses a Rust ↔ MiniJinja FFI boundary, marshalling values between type systems. With ~2,600 models × ~20 dict operations per flatten_model call, plus thousands of tests, sources, and columns, this amounts to hundreds of thousands of FFI crossings that compound into minutes of processing.

The specific macros involved:

  • upload_dbt_models → iterates graph.nodes.values() | selectattr("resource_type", "==", "model")
  • upload_dbt_tests → iterates graph.nodes.values() | selectattr("resource_type", "==", "test")
  • upload_dbt_sources → iterates graph.sources.values()
  • upload_dbt_columns → iterates graph.nodes.values() + graph.sources.values()
  • Each calls a flatten_* callback with extensive dict manipulation

upload_run_results and upload_dbt_invocation are not affected (they process only the executed results, not the full graph).

Suggested Fix

Consider an alternative code path for dbt Fusion (detectable via elementary.is_dbt_fusion()) that avoids per-node Jinja dict processing, e.g.:

  • Bulk-export graph metadata via a SQL-based approach instead of Jinja iteration
  • Use Fusion's native capabilities to serialize graph nodes more efficiently
  • Or limit artifact upload to only nodes related to the executed models

Workaround

Setting disable_dbt_artifacts_autoupload: true avoids the hang but disables all artifact metadata upload, which means Elementary monitoring/reporting loses model/test/source metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions