Large DAG with conditional subset execution: Is DAG Versioning + Dynamic DAG Generation the right approach? #64344

rt849122 · 2026-03-28T03:41:34Z

rt849122
Mar 28, 2026

Hi Airflow community,

We're facing a challenge with a large DAG (150+ tasks) and would appreciate your advice on best practices.

Current Setup:

• We have a complex DAG with 150+ tasks
• Each DAG run only needs to execute a subset of these tasks
• Currently we use conditional logic to skip tasks that aren't needed
• The problem: When we want to run very downstream tasks, we have to wait for all upstream tasks to be evaluated and skipped, which takes 10+ minutes due to DAG complexity

Proposed Solution: We're considering using DAG Versioning combined with Dynamic DAG Generation:

Generate a new DAG version for each run
In each version, only include the tasks that need to run (removing unnecessary tasks entirely)
This would eliminate the need for skip logic and waiting

Is this an appropriate use case for DAG Versioning? Is there a better pattern for this use case?

Thanks in advance!

rt849122 · 2026-03-28T03:50:36Z

rt849122
Mar 28, 2026
Author

A possible solution is using multiple DAGs (every run triggers multiple dag run). But it is not practical in our case, as it would require launching numerous DAG runs each time, making it difficult to visualize dependencies and resulting in a poor user experience."

0 replies

rt849122 · 2026-03-28T03:57:34Z

rt849122
Mar 28, 2026
Author

Additional context: This is a deep learning use case. Our DAG contains a complete pipeline including data preprocessing, feature engineering, and model training. Each time we trigger a DAG run, we need to selectively run specific stages—for example, sometimes only data cleaning, sometimes only model training with existing cleaned data, or sometimes the full pipeline from scratch."

0 replies

Urus1201 · 2026-04-09T12:52:20Z

Urus1201
Apr 9, 2026

DAG Versioning is not the right tool here — it is designed for schema/structural changes over time, not for runtime task selection. Dynamically generating a new DAG per run also bypasses Airflow's scheduler entirely, which leads to parse overhead and confusing audit history.

For a 150-task ML pipeline where each run only executes a subset, there are three patterns worth considering:

Pattern 1: ShortCircuitOperator at group boundaries (fastest fix)

If the 10-minute wait is from evaluating individual skip conditions per task, move the skip decision upstream to a ShortCircuitOperator that gates an entire TaskGroup. Skipping is O(1) per group rather than O(n) per task.

from airflow.operators.python import ShortCircuitOperator
from airflow.utils.task_group import TaskGroup

def should_run_preprocessing(**context) -> bool:
    run_conf = context["dag_run"].conf or {}
    return "preprocessing" in run_conf.get("stages", ["preprocessing", "training"])

with TaskGroup("preprocessing") as preprocessing_group:
    gate = ShortCircuitOperator(
        task_id="gate_preprocessing",
        python_callable=should_run_preprocessing,
    )
    clean = PythonOperator(task_id="clean_data", ...)
    features = PythonOperator(task_id="engineer_features", ...)
    gate >> clean >> features

with TaskGroup("training") as training_group:
    gate2 = ShortCircuitOperator(
        task_id="gate_training",
        python_callable=lambda **ctx: "training" in (ctx["dag_run"].conf or {}).get("stages", ["preprocessing", "training"]),
    )
    train = PythonOperator(task_id="train_model", ...)
    gate2 >> train

preprocessing_group >> training_group

When gate_preprocessing returns False, the entire group is marked skipped immediately — no 10-minute wait.

Pattern 2: BranchPythonOperator for mutually exclusive paths

If the stages are mutually exclusive (e.g. "only preprocessing" vs "only training" vs "full pipeline"), use branching:

from airflow.operators.python import BranchPythonOperator

def choose_pipeline(**context):
    stages = context["dag_run"].conf.get("stages", ["preprocessing", "training"])
    if stages == ["preprocessing"]:
        return "preprocessing.gate_preprocessing"
    elif stages == ["training"]:
        return "training.gate_training"
    return ["preprocessing.gate_preprocessing", "training.gate_training"]  # full pipeline

branch = BranchPythonOperator(
    task_id="route_pipeline",
    python_callable=choose_pipeline,
)

Pattern 3: Dynamic Task Mapping (Airflow 2.3+ / 3.x, cleanest for ML)

For your specific ML use case, dynamic task mapping lets Airflow create only the tasks that are actually needed at parse time:

@task
def get_stages_to_run(**context) -> list[str]:
    return context["dag_run"].conf.get("stages", ["preprocessing", "training"])

@task
def run_stage(stage_name: str):
    stage_map = {
        "preprocessing": run_preprocessing,
        "training": run_training,
    }
    stage_map[stage_name]()

stages = get_stages_to_run()
run_stage.expand(stage_name=stages)

With this, Airflow creates only the mapped task instances for the stages that were requested — no skip logic at all.

Recommendation for your case

Given that you have a deep learning pipeline with independent stages (data cleaning → feature engineering → model training):

Immediately: Add ShortCircuitOperator gates inside TaskGroups (Pattern 1). This drops the 10-minute wait to seconds without restructuring your DAG.
Long term: Refactor each major stage (preprocessing, training, etc.) into its own DAG, and use TriggerDagRunOperator from a "coordinator" DAG to launch only the needed stages. This gives clean separation, independent retry granularity, and a much smaller parse footprint per run.

DAG Versioning is worth enabling for schema management, but it should not be used as a mechanism for runtime task selection.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large DAG with conditional subset execution: Is DAG Versioning + Dynamic DAG Generation the right approach? #64344

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Large DAG with conditional subset execution: Is DAG Versioning + Dynamic DAG Generation the right approach? #64344

Uh oh!

rt849122 Mar 28, 2026

Replies: 3 comments

Uh oh!

rt849122 Mar 28, 2026 Author

Uh oh!

rt849122 Mar 28, 2026 Author

Uh oh!

Urus1201 Apr 9, 2026

Pattern 1: ShortCircuitOperator at group boundaries (fastest fix)

Pattern 2: BranchPythonOperator for mutually exclusive paths

Pattern 3: Dynamic Task Mapping (Airflow 2.3+ / 3.x, cleanest for ML)

Recommendation for your case

rt849122
Mar 28, 2026

rt849122
Mar 28, 2026
Author

rt849122
Mar 28, 2026
Author

Urus1201
Apr 9, 2026