[perf] feat: add GDN (Gated DeltaNet) FLOPs calculator by cuichenx · Pull Request #2925 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-03-20T20:01:57Z

What does this PR do?

Port the GDN (Gated DeltaNet) FLOPs formula from Megatron-LM into Bridge's flop_utils.py so that Qwen3.5 VL and Qwen3-Next models report accurate throughput numbers.

Changelog

Add GDN layer FLOPs calculation in transformer_flops() when experimental_attention_variant="gated_delta_net" is set
Parse linear_attention_freq (int or list) to split layers into GDN vs standard attention, matching Megatron-LM's convention
Compute per-layer GDN cost: in_proj + conv1d + gated delta rule (KK^T, VK^T, state update, output) + out_proj
Produce a weighted self_attn_term combining GDN and standard-attention per-layer costs
Add unit tests covering exact formula verification, layer counting, and edge cases

Reference

Ported from Megatron-LM PR #1989 — "feat(moe): Support gated delta net for Qwen3-Next"

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI.
A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Previously, Bridge's flop_utils.py had no GDN support — Qwen3.5 VL and Qwen3-Next training reported FLOPs as if all layers were standard attention. This PR fixes that by implementing the same formula used in Megatron-LM's training.py (lines 488-514).

Summary by CodeRabbit

New Features
- Added support for experimental gated_delta_net attention variant with configurable layer mixing.
- Introduced linear_attention_freq configuration to designate which layers use the new attention type.
- Added five new parameters for linear attention dimension tuning.
Tests
- Added comprehensive test suite for gated_delta_net FLOP calculations.

Port the GDN FLOPs formula from Megatron-LM (PR #1989) into Bridge's flop_utils.py so that Qwen3.5 VL and Qwen3-Next models report accurate throughput numbers instead of treating all layers as standard attention. When experimental_attention_variant="gated_delta_net" is set on the model config, transformer_flops() now: - Parses linear_attention_freq to split layers into GDN vs standard attention - Computes per-layer GDN cost (in_proj + conv1d + gated delta rule + out_proj) - Produces a weighted self_attn_term over both layer types Signed-off-by: Chen Cui <chcui@nvidia.com>

coderabbitai · 2026-03-20T20:14:05Z

📝 Walkthrough

Walkthrough

This change adds support for computing floating-point operations for Gated DeltaNet (GDN) attention layers in mixed-attention configurations. The implementation conditionally recalculates self-attention FLOP costs based on frequency patterns specified in configuration, allowing different layers to use either GDN or standard attention mechanisms.

Changes

Cohort / File(s)	Summary
GDN FLOP Calculation Implementation `src/megatron/bridge/training/utils/flop_utils.py`	Added conditional logic within `transformer_flops()` to handle `experimental_attention_variant="gated_delta_net"`. Implements per-layer mixing patterns driven by `linear_attention_freq` (supporting both integer repeating patterns and list-based masks). Computes weighted self-attention term as a blend of GDN and standard attention layer costs using new configuration parameters (`linear_key_head_dim`, `linear_value_head_dim`, `linear_num_key_heads`, `linear_num_value_heads`, `linear_conv_kernel_dim`).
GDN Test Suite & Config Extensions `tests/unit_tests/training/utils/test_flop_utils.py`	Extended `MockModelConfig` with seven new GDN-related fields. Added comprehensive `TestGDNLayerFlops` test class with five test methods validating GDN FLOP calculations across various scenarios: baseline comparison, frequency patterns (integer and list), exact formula verification, and sensitivity to parameter changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR includes comprehensive unit tests for GDN FLOPs calculator feature, but lacks documentation of actual test execution results, CI status, and regression validation in PR description.	Document test results showing all GDN tests pass, existing tests still pass (regression check), and optionally validate against Megatron-LM reference implementation.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding GDN (Gated DeltaNet) FLOPs calculator support. It directly relates to the core functionality introduced in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chcui/gdn-flops-calculator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unit_tests/training/utils/test_flop_utils.py (1)
489-500: Make the explicit-list test verify the actual mask behavior.

assert flops > 0 still passes if the list mask is ignored or its semantics flip. Please assert the expected 6-GDN/2-standard split, or compare against a hand-computed total, so the list branch is actually protected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/training/utils/test_flop_utils.py` around lines 489 - 500,
The test must verify the list mask is actually applied: after building model_cfg
in test_gdn_layer_freq_list, assert that model_cfg.linear_attention_freq (or the
config field that stores the per-layer mask) equals the freq_list
([1,1,0,1,1,0,1,1]) and additionally assert sum(model_cfg.linear_attention_freq)
== 6 to ensure a 6-GDN/2-standard split before calling
num_floating_point_operations; keep the existing flops check if desired.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/training/utils/flop_utils.py`:
- Around line 389-411: When experimental_attention_variant == "gated_delta_net",
make the GDN config strict: if cfg.model.linear_attention_freq is None raise a
ValueError; if it's not an int or list raise a TypeError (do not silently fall
back to [1]*num_layers); when it's a list assert len(linear_attention_pattern)
== num_layers; and stop using default fallbacks for GDN dims—replace
getattr(cfg.model, "linear_key_head_dim", 128), "linear_value_head_dim",
"linear_num_key_heads", "linear_num_value_heads", and "linear_conv_kernel_dim"
with direct attribute access (e.g., cfg.model.linear_key_head_dim) so missing
fields raise immediately.

In `@tests/unit_tests/training/utils/test_flop_utils.py`:
- Around line 391-392: The test suite class TestGDNLayerFlops is not marked as a
unit test; add a pytest marker by importing pytest (if not already present) and
placing `@pytest.mark.unit` directly above the TestGDNLayerFlops class definition
so the whole class is selected by pytest's unit marker filtering.

---

Nitpick comments:
In `@tests/unit_tests/training/utils/test_flop_utils.py`:
- Around line 489-500: The test must verify the list mask is actually applied:
after building model_cfg in test_gdn_layer_freq_list, assert that
model_cfg.linear_attention_freq (or the config field that stores the per-layer
mask) equals the freq_list ([1,1,0,1,1,0,1,1]) and additionally assert
sum(model_cfg.linear_attention_freq) == 6 to ensure a 6-GDN/2-standard split
before calling num_floating_point_operations; keep the existing flops check if
desired.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f3722b61-a992-43a5-85e3-4b7d34034089

📥 Commits

Reviewing files that changed from the base of the PR and between 877779a and ec08ba0.

📒 Files selected for processing (2)

src/megatron/bridge/training/utils/flop_utils.py
tests/unit_tests/training/utils/test_flop_utils.py

src/megatron/bridge/training/utils/flop_utils.py

tests/unit_tests/training/utils/test_flop_utils.py

- Make GDN config strict: raise ValueError when linear_attention_freq is None, raise TypeError on invalid types, assert list length matches num_layers (mirrors existing MoE validation pattern) - Replace getattr fallbacks for GDN dims with direct attribute access so missing fields fail explicitly - Add @pytest.mark.unit to TestGDNLayerFlops class - Strengthen test_gdn_layer_freq_list to verify 6/2 split by comparing against equivalent int freq=3 and against pure-standard baseline Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx · 2026-03-20T23:49:15Z

/claude review

src/megatron/bridge/training/utils/flop_utils.py

tests/unit_tests/training/utils/test_flop_utils.py

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>

…to adding-model-support skill - Step 4 (Discovery): Check for quantized weights (FP8/FP4) that silently break models without dequantization. Documents standalone script and in-bridge hook approaches. - Phase 2: Update FLOPs calculator when new architectural blocks (GDN, MTP, Mamba) differ from standard attention/MLP. References PR #2925 as example. Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx · 2026-03-21T00:43:47Z

/ok to test a0c7541

copy-pr-bot bot temporarily deployed to test March 20, 2026 20:02 Inactive

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

src/megatron/bridge/training/utils/flop_utils.py Outdated Show resolved Hide resolved

tests/unit_tests/training/utils/test_flop_utils.py Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to nemo-ci March 20, 2026 20:14 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 20, 2026 20:24 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 20, 2026 20:31 Inactive

cuichenx added area:perf Performance optimizations and benchmarking needs-review PR is ready for code review and waiting on a reviewer labels Mar 20, 2026

cuichenx linked an issue Mar 20, 2026 that may be closed by this pull request

[bug] Qwen35's TFLOPS calculation is incorrect. #2898

Open

copy-pr-bot bot had a problem deploying to test March 20, 2026 23:49 Error

claude bot reviewed Mar 20, 2026

View reviewed changes

src/megatron/bridge/training/utils/flop_utils.py Outdated Show resolved Hide resolved

claude bot reviewed Mar 20, 2026

View reviewed changes

tests/unit_tests/training/utils/test_flop_utils.py Show resolved Hide resolved

Update src/megatron/bridge/training/utils/flop_utils.py

9977537

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>

copy-pr-bot bot temporarily deployed to test March 20, 2026 23:52 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 20, 2026 23:54 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 21, 2026 00:05 Failure

copy-pr-bot bot temporarily deployed to nemo-ci March 21, 2026 00:05 Inactive

cuichenx mentioned this pull request Mar 21, 2026

[doc] feat: Add FLOPs calculator and FP8/FP4 dequantization guidance to adding-model-support skill #2934

Open

2 tasks

Merge branch 'main' into chcui/gdn-flops-calculator

a0c7541

copy-pr-bot bot temporarily deployed to test March 21, 2026 00:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 21, 2026 00:46 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 21, 2026 00:54 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 21, 2026 00:54 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf] feat: add GDN (Gated DeltaNet) FLOPs calculator#2925

[perf] feat: add GDN (Gated DeltaNet) FLOPs calculator#2925
cuichenx wants to merge 4 commits intomainfrom
chcui/gdn-flops-calculator

cuichenx commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

cuichenx commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

cuichenx commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cuichenx commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changelog

Reference

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cuichenx commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

cuichenx commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cuichenx commented Mar 20, 2026 •

edited by coderabbitai bot

Loading