[model] feat: add GLM-5 (MoE + MLA + DSA) bridge and provider by yaoyu-33 · Pull Request #2913 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-03-20T03:49:52Z

What does this PR do?

Takes over #2469 and rebases onto latest main.

Adds support for the GLM-5 model (MoE + MLA + DSA architecture).

Changes

GLM5Bridge: bidirectional HF ↔ Megatron-Core checkpoint conversion for GlmMoeDsaForCausalLM
GLM5ModelProvider: config/arch mapping for the glm_moe_dsa architecture
Overrides maybe_modify_loaded_hf_weight / maybe_modify_converted_hf_weight to handle fused expert tensors (3D) introduced in transformers>=5.2.0
Exposes GLM5Bridge and GLM5ModelProvider in the glm_moe_dsa package
Functional tests for GLM-5 conversion across parallelism configs

Note: Requires transformers>=5.2.0 locally to use this feature, due to the new fused expert weight layout.

GitHub Actions CI

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install?
- Reviewer: Does the PR have correct import guards for all optional libraries?

Summary by CodeRabbit

Release Notes

New Features
- Added support for GLM-5 models with Mixture of Experts (MoE) and Distributed Sparse Attention (DSA) capabilities, including HuggingFace-to-Megatron conversion.
Updates
- Reorganized model provider exports for improved API structure, including expanded support for Llama, DeepSeek, NeMoTron, and GLM variants.
- Updated upstream Megatron-LM dependency.
Tests
- Added functional test suite for GLM-5 model conversion validation across multiple parallelism configurations.

Introduce FusedExpertMapping and FusedGatedExpertMapping in param_mapping.py to handle many-to-one / one-to-many expert weight conversions generically. This eliminates duplicated maybe_modify_converted_hf_weight overrides and hf_weights_cache from GPT-OSS, GLM-4.5, GLM-4.5V, and Qwen3-VL bridges (-502 / +307 lines). Also fixes two pre-existing bugs: - GLM-4.5 MTP mappings used stale 'transformer_layer' instead of 'mtp_model_layer', causing missing-mapping warnings - hf_to_megatron_generate_text.py set mtp_num_layers=None which crashed MTP-enabled models; replaced with m.mtp_process=False Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

Signed-off-by: yaoyu-33 <[email protected]>

- Remove NemotronNano12Bv2Provider from nemotron_vl/__init__.py (was a deprecated alias from deleted nemotron_h_provider.py) - Remove invalid max_position_embeddings kwarg from kimi and moonlight recipes (not a field on MLAModelProvider) - Update moonlight test to monkeypatch MLAModelProvider instead of deleted MoonlightModelProvider16B Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

Add Megatron Bridge for MiniMaxAI/MiniMax-M2, a sparse MoE model with 256 experts, top-8 sigmoid routing, and expert bias correction. Includes: - Bridge with config mapping and per-expert weight conversion (block_sparse_moe prefix, w1/w2/w3 format) - Partial RoPE support (rotary_dim -> rotary_percent) - QK layernorm intentionally disabled (full-dim vs per-head mismatch) - Functional test with toy model for TP/PP/EP parallelism - Example scripts for conversion, inference, and verification - compare.py fix: truncate Megatron logits to HF vocab size for proper comparison when Megatron pads vocab for kernel efficiency Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

…ti-node support for MiniMax-M2 Add custom full-dimension QK normalization (minimax_m2_provider.py) since MiniMax-M2 applies RMSNorm over the entire Q/K projection rather than per-head. The implementation uses sum-of-squares all-reduce across TP ranks and provides sharded_state_dict for distributed checkpointing. Add on-the-fly FP8 block-wise dequantization in the bridge via maybe_modify_loaded_hf_weight, converting float8_e4m3fn weights to bfloat16 using per-block scale_inv factors during HF->Megatron conversion. Add multi-node Slurm scripts (slurm_conversion.sh, slurm_inference.sh) for configurations requiring TP*EP*PP > 8 GPUs. Update verify_toy_model.py to extract real pretrained weights (N layers) from the FP8 model, dequantize to bf16, and verify round-trip accuracy. Fix dtype mismatch handling in hf_megatron_roundtrip_multi_gpu.py for FP8 source models. Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

…x-M2 expert mappings - Add missing FusedGatedExpertMapping alias (GLMExpertGateUpProjMapping) to glm_moe_mappings.py, fixing ImportError after fused expert refactor - Remove duplicate local_experts.* mappings from MiniMax-M2 bridge since moe_grouped_gemm=True (only grouped-gemm weight* path needed) Verified: TP=2, PP=2, EP=2 roundtrip tests pass on cluster with zero mapping warnings. Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

Remove verify_toy_model.py dev script. Align conversion.sh and inference.sh with GPT-OSS pattern (import + export + roundtrip, multi-checkpoint inference). Rewrite slurm_conversion.sh to sweep parallelism configs (TP,PP,EP) with roundtrip validation. Clean up slurm_inference.sh for consistency. All configs verified on cluster-cw with toy model: TP=2,PP=1,EP=4 | TP=1,PP=2,EP=4 | TP=2,PP=2,EP=2 → EXIT=0 Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

…eanup - Add SLURM env var auto-population in model_provider.py for srun launches (RANK, WORLD_SIZE, LOCAL_RANK, MASTER_ADDR, MASTER_PORT from SLURM vars) - Increase NCCL init_process_group timeout to 60 minutes for large MoE models - Fix ImportError crash in save_artifacts for trust_remote_code models - Accept SLURM_NTASKS in hf_megatron_roundtrip_multi_gpu.py for srun launches - Rewrite MiniMax-M2 slurm scripts to use srun-native (ntasks-per-node=8) instead of torch.distributed.run - Remove single-node conversion.sh/inference.sh (MiniMax-M2 requires multi-node) - Set verified parallelism defaults: TP=2,EP=8 roundtrip; TP=1,EP=16 inference Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

…or PR #2628) Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

…efixme for PR #2628)" This reverts commit 206c4fb.

Signed-off-by: yaoyu-33 <[email protected]>

…mappings The refactor in param_mapping.py renamed GLMExpertGateUpProjMapping to FusedGatedExpertMapping but only added GLMExpertDownProjMapping alias in glm_moe_mappings.py. Add the missing alias so existing bridge imports (glm45_bridge.py, glm_45v_bridge.py) continue to work. Signed-off-by: yaoyu-33 <[email protected]>

Split multi-name import block into two separate import statements, each with per-line # noqa: F401 comments, to satisfy ruff's import block formatting requirements. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…ext tests - Set PROVIDER_CLASS = Qwen3NextModelProvider so super().provider_bridge() instantiates the correct provider (not GPTModelProvider which lacks MLA/hybrid fields like q_lora_rank) - Add value is not None guard in hf_config_to_provider_kwargs to skip None-valued config fields - Add null_attr fixture loop in test mocks to suppress Mock() objects for MLA/alternative-expert CONFIG_MAPPING fields Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Keep once-per-class dtype mismatch warning from HEAD (suppresses duplicate warnings) over main's per-call version. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Signed-off-by: yaoyu-33 <[email protected]>

… inference - gpt_oss_bridge: transpose only down_proj (not gate_up_proj) in maybe_modify_loaded_hf_weight; use _align_expert_weight_to_shape in GPTOSSMLPGateUpProjMapping.hf_to_megatron for auto-detection of transposed vs standard expert weight layout across transformers versions - glm45_bridge: loop over both mtp_model_layer and transformer_layer prefixes when building MTP param mappings, fixing roundtrip for MTP layers - generate_text: set mtp_num_layers=0 (not None) to make range(0) a safe no-op instead of crashing with range(None) for MTP models Verified on cluster: GPT-OSS BF16 roundtrip ✅, GPT-OSS MXFP4 inference ✅, GLM-4.5-Air roundtrip ✅, GLM-4.5-Air inference ✅ ("Paris is the capital of France.") Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- Add `transpose_on_export` flag to `FusedExpertMapping` instead of shape-based transpose detection (avoids silent bugs with square tensors) - Make `GLMExpertDownProjMapping` a proper subclass with `transpose_on_export=True` always set - Add `transpose_on_export=True` to down_proj `FusedExpertMapping` instances in Qwen3-VL and Qwen3.5-VL bridges - Restore inline comments removed from `stream_weights_megatron_to_hf` - Refactor MTP-disable logic in `hf_to_megatron_generate_text.py` to use a `_disable_mtp` helper (matching vlm script pattern); fix `mtp_num_layers=0` → `None` Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Resolve conflict in qwen35_vl_bridge.py: keep PR's FusedExpertMapping / FusedGatedExpertMapping approach (removing hf_weights_cache and maybe_modify_converted_hf_weight) rather than main's ExpertMLPDownProjMapping / ExpertMLPGateUpProjMapping pattern. The class now inherits from MegatronModelBridge directly (main's change, kept). Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…train The μP functional test (eb93f96) referenced Llama32ModelProvider1B but never defined or imported it, causing an F821 lint failure. Add a local @DataClass subclass of GPTModelProvider with Llama 3.2 1B architecture defaults. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…rt weight alignment - generate_text: extract _disable_mtp() helper (mirrors generate_vlm.py pattern) with mtp_num_layers=0 to avoid range(None) crash on MTP-enabled text models - param_mapping: add transpose_hint param to _align_expert_weight_to_shape (True/False for explicit control, None for auto-detect with square-shape guard) as suggested by reviewer; raise clear error when auto-detect encounters ambiguous square 2D weights Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- param_mapping.py: inline raise ValueError to satisfy line-length formatting - gpt_oss_bridge.py: remove unused GPTOSSProvider import (F401) Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Merge Llama32ModelProvider1B definitions: take main's more complete set of architecture defaults (activation_func, position_embedding_type, bias/fusion flags, rotary_base) and add HEAD-only fields (kv_channels, rope_scaling, rope_scaling_factor) for a full Llama 3.2 1B config. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…xpert-mappings

…ngs' into yuya/add-minimax-m2-bridge

Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- model_bridge.py: take remote's simpler provider_kwargs guard (value is not None) and unconditional transpose_on_export path - hf_to_megatron_generate_text.py: take remote's _disable_mtp that uses mtp_num_layers=0 and sets grad_scale_func=None in one place - Sync hf_megatron_roundtrip_multi_gpu.py and qwen3_vl transformer_config from remote Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Bring submodule pointer in line with origin/main (905c0e38). Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…ridge

…n merge The file was removed in yuya/refactor-fused-expert-mappings but main still has it. Restore from origin/main to keep this PR scoped to MiniMax-M2 bridge only. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- Revert model_bridge.py: restore adaptive transpose logic and original CONFIG_MAPPING first-match-wins guard (megatron_name not in provider_kwargs) - Revert hf_to_megatron_generate_text.py: revert _disable_mtp to original form with mtp_num_layers=None, inner/lang traversal, and grad_scale_func - Revert hf_megatron_roundtrip_multi_gpu.py: restore dt_bias in IGNORE_PRECISION_PARAMS and revert WORLD_SIZE check to original - Extract disable_mtp_for_inference() to megatron.bridge.utils.common_utils so compare.py and generate_text.py share a single implementation - Restore removed comments in compare.py (broadcast setup section) - Fix test: use sys.executable, compute paths from __file__, pytest.fail() instead of assert False - Restore tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py - Add examples/models/minimax_m2/README.md with hardware requirements and expected inference/conversion output Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- Remove Lustre mount from CONTAINER_MOUNTS examples (not universally available) - Replace placeholder inference output with verified prompt/output: "What is 2+2?" → "The answer is 4." Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Not needed in our setup. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- Revert model_provider.py: remove SLURM env var population block (RANK/WORLD_SIZE/LOCAL_RANK/MASTER_ADDR/MASTER_PORT) and revert init_process_group timeout=timedelta(minutes=60) back to plain "nccl" - Restore compare.py comment: restore full text "# Compare outputs (only where we have valid Megatron results)" Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- tests/functional_tests/L0_Launch_models_minimax_m2.sh: new L0 CI script targeting the minimax_m2 functional test directory - tests/unit_tests/models/minimax_m2/test_minimax_m2_bridges.py: 28 unit tests covering config mapping, MoE sigmoid-routing provider settings, rotary_percent calculation, FP8 blockwise dequantization, maybe_modify_loaded_hf_weight (passthrough + dequant + dict), _FullDimRMSNorm / FullDimQNorm / FullDimKNorm factories (TP=1/2), mapping registry completeness, and _FullDimQKNormMapping TP=1 paths Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Register the new L0 bash script in the cicd-functional-tests-l0 job matrix so it runs on every PR alongside the other L0 model tests. Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…ture Signed-off-by: yaoyu-33 <[email protected]>

The functional tests run bridge code via torch.distributed.run subprocesses. In some environments COVERAGE_PROCESS_START doesn't propagate coverage data back to /opt/Megatron-Bridge/ (unlike deepseek where it works), leaving the outer coverage process with no measured files and causing `coverage xml` to exit 1 ("No data to report") in the Check result step. Fix: after pytest, run a minimal in-process import of MiniMaxM2Bridge via the coverage Python API (data_suffix=True → parallel file). coverage combine then picks it up and coverage xml has at least one measured src file. Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- Remove dtype-mismatch silencing in hf_megatron_roundtrip_multi_gpu.py: only cast to float32 for params listed in IGNORE_PRECISION_PARAMS, not silently whenever dtypes differ (which would hide real issues) - Remove /lustre reference from MiniMax-M2 slurm_conversion.sh example mount Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

- Add GLM5Bridge for bidirectional HF <-> Megatron-Core checkpoint conversion - Add GLM5ModelProvider with config/arch mapping for glm_moe_dsa architecture - Override maybe_modify_loaded_hf_weight and maybe_modify_converted_hf_weight to handle fused expert tensors (3D) in transformers>=5.2.0 - Expose GLM5Bridge and GLM5ModelProvider in glm_moe_dsa __init__ - Add functional tests for GLM-5 conversion across parallelism configs Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

copy-pr-bot · 2026-03-20T03:49:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-03-20T03:49:56Z

/ok to test 6e1bf75

coderabbitai · 2026-03-20T04:05:12Z

📝 Walkthrough

Walkthrough

This pull request adds GLM-5 MoE+DSA model support to the Megatron bridge framework by introducing a new bridge implementation and model provider with comprehensive conversion logic. Additionally, the main models package exports are refactored to transition from bridge-based to provider-based classes, and the Megatron-LM submodule pointer is updated.

Changes

Cohort / File(s)	Summary
Submodule Update `3rdparty/Megatron-LM`	Updated Megatron-LM submodule commit reference from `9ed8b0c` to `0e74689`.
Models Package Export Refactoring `src/megatron/bridge/models/__init__.py`	Transitioned exports from bridge-based classes to provider-based classes across multiple model families (DeepSeek, GLM, GPT-OSS, Llama, NeMoTron, etc.). Removed obsolete imports for Qwen Omni, Sarvam, and Qwen-VL-35 variants while adding granular provider exports for model size variants.
GLM-5 MoE+DSA Bridge Implementation `src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py`	Implements `GLM5Bridge` for HuggingFace-to-Megatron conversion, featuring HF state enumeration, extensive layer-wise conversion mapping for MLA/DSA attention variants, expert-based MoE handling with fused/per-expert tensor strategies, and conditional MTP-layer support. Includes expert tensor merging logic for roundtrip export.
GLM-5 Model Provider Configuration `src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py`	Defines `GLM5ModelProvider` dataclass with MLATransformerConfig integration, conditional transformer layer-spec selection (experimental variant with fallback), and comprehensive default hyperparameters covering RoPE, MLA ranks/dimensions, DSA indexer settings, MoE configuration, and MTP auxiliary parameters.
GLM-5 Package Initialization `src/megatron/bridge/models/glm_moe_dsa/__init__.py`	Package initializer that exports `GLM5Bridge` and `GLM5ModelProvider` from their respective modules.
GLM-5 Functional Tests `tests/functional_tests/models/glm_moe_dsa/*.py`	Functional test suite for GLM-5 conversion, including toy model creation fixture, roundtrip conversion validation across multiple parallelism strategies (TP/PP/EP), and optional disk-local model testing. Verifies model structure, configuration preservation, and conversion correctness under GPU-only execution.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant HFModel as HuggingFace<br/>GlmMoeDsaForCausalLM
    participant GLM5Bridge as GLM5Bridge<br/>Converter
    participant MappingRegistry as Conversion<br/>Mapping Registry
    participant MegatronModel as Megatron-Core<br/>GPTModel

    User->>GLM5Bridge: Initiate conversion<br/>(hf_pretrained, megatron_model)
    GLM5Bridge->>HFModel: Enumerate state keys<br/>& load config
    HFModel-->>GLM5Bridge: HF state dict & config
    
    GLM5Bridge->>GLM5Bridge: Build GLM5ModelProvider<br/>with MLA/DSA/MoE params
    GLM5Bridge->>MappingRegistry: Construct mapping registry<br/>(global + layer-wise)
    
    MappingRegistry->>MappingRegistry: Detect expert layout<br/>(fused vs per-expert)
    MappingRegistry->>MappingRegistry: Define attention/MLA/DSA<br/>& MoE expert mappings
    MappingRegistry-->>GLM5Bridge: Configured registry
    
    GLM5Bridge->>GLM5Bridge: Execute conversion tasks<br/>Apply mappings to weights
    GLM5Bridge->>GLM5Bridge: Merge per-expert tensors<br/>for HF export format
    
    GLM5Bridge->>MegatronModel: Load converted weights
    MegatronModel-->>User: Converted model ready

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR adds major GLM-5 feature but lacks test execution results and evidence of passing tests; unresolved review issues identified.	Add actual test execution results, confirmation of model validation against source, and address all review comments before merge.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[model] feat: add GLM-5 (MoE + MLA + DSA) bridge and provider' directly and concisely summarizes the main change: adding GLM-5 bridge and provider with specific technical features.
Linked Issues check	✅ Passed	The PR successfully implements GLM-5 bridge and provider support as requested in issue `#2343`, including HF↔Megatron conversion, architecture mapping, expert tensor handling, and functional tests [`#2343`].
Out of Scope Changes check	✅ Passed	All code changes directly support GLM-5 bridge/provider implementation. The submodule update to Megatron-LM is a dependency refresh likely needed for GLM-5 support, and export changes align with integrating new providers into the public API.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch yuya-add-glm5

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@3rdparty/Megatron-LM`:
- Line 1: The PR points the 3rdparty/Megatron-LM submodule at an unreachable
commit (0e7468950283aaa89d36c3ef105892db8dd00f56); update the submodule to a
published, fetchable commit: check out a known public commit in the
3rdparty/Megatron-LM repo, update the gitlink in the superproject (stage and
commit the change), and ensure .gitmodules (if modified) is consistent so git
submodule update works in CI; reference the submodule path 3rdparty/Megatron-LM
and replace the bad commit hash with the valid published commit hash in the
superproject commit.

In `@src/megatron/bridge/models/__init__.py`:
- Around line 135-156: Two different modules export NemotronNano12Bv2Provider
and the second import overwrites the first; fix by aliasing one or both imports
(e.g., import NemotronNano12Bv2Provider from megatron.bridge.models.nemotron_vl
as NemotronNano12Bv2ProviderVL and from
megatron.bridge.models.nemotronh.nemotron_h_provider as
NemotronNano12Bv2ProviderH), then update the top-level exports that currently
reference NemotronNano12Bv2Provider to point to the correct alias (choose which
provider should be exported under the public name) and update any internal
references to use the new alias names (check usages of
NemotronNano12Bv2Provider, NemotronVLBridge, and entries in the module's __all__
if present).
- Around line 104-111: The import currently brings in LlamaNemotronBridge but it
is not exported in __all__, causing a flake8 F401; either remove
LlamaNemotronBridge from the import tuple in the
megatron.bridge.models.llama_nemotron import block or add "LlamaNemotronBridge"
to the module's __all__ list to explicitly re-export it; locate the import
statement that lists Llama31Nemotron70BProvider, Llama31NemotronNano8BProvider,
Llama31NemotronUltra253BProvider, Llama33NemotronSuper49BProvider,
LlamaNemotronBridge, LlamaNemotronHeterogeneousProvider and update it
accordingly so the import/export surface is consistent.
- Around line 67-70: The module re-exports GLM-5 symbols unguarded causing
ImportError on transformers<5.3.0; wrap the import of GLM5Bridge and
GLM5ModelProvider from megatron.bridge.models.glm_moe_dsa in a try/except
ImportError (catching the missing GlmMoeDsaForCausalLM) and on failure set those
names to None or omit them from __all__ so importing megatron.bridge.models
won't raise; alternatively update the package requirement to transformers>=5.3.0
in pyproject.toml if you prefer to enforce availability instead of guarding the
import.

In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py`:
- Around line 127-135: Replace the warning in the
ImportError/ModuleNotFoundError handler with a fail-fast exception: instead of
logger.warning("DSA spec not available; falling back to standard GPT decoder
block spec."), raise a NotImplementedError (or RuntimeError) that clearly states
the DSA experimental_attention variant is required because subsequent conversion
registers DSA-specific target mappings (e.g., self_attn.indexer.*) and will fail
or skip weights if the spec is absent; reference provider.transformer_layer_spec
and get_transformer_block_with_experimental_attention_variant_spec in the
message so maintainers know which component is missing.
- Around line 400-449: This method implements a bridge-local fused-expert merge
that duplicates grouping/transpose behavior already handled by
GLMExpertGateUpProjMapping/GLMExpertDownProjMapping and must be removed; replace
the body of maybe_modify_converted_hf_weight so it no longer performs any
per-expert caching/stacking or special-case mapping checks (remove all uses of
self.hf_weights_cache and the if isinstance(...
GLMExpertGateUpProjMapping/GLMExpertDownProjMapping) branch and the merging
logic), and instead simply return converted_weights_dict unchanged (or the
incoming dict) so grouped export/transpose behavior is driven solely by the
mapping/param_mapping code (keep the method signature unchanged).

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py`:
- Around line 138-145: Add an explicit 2-GPU requirement check and
documentation: in the module containing _build_roundtrip_cmd, import pytest and
torch, add a module-level docstring or test docstrings noting "requires 2 CUDA
devices", and either decorate affected tests with
`@pytest.mark.skipif`(torch.cuda.device_count() < 2, reason="requires 2 CUDA
devices") or call pytest.skip at the start of those test functions when
torch.cuda.device_count() < 2; ensure the _build_roundtrip_cmd usage remains
unchanged but tests are skipped early with a clear message about the 2-GPU
requirement.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 511bd143-5667-4f5c-8f39-5458ec7f6045

📥 Commits

Reviewing files that changed from the base of the PR and between f565adb and 6e1bf75.

📒 Files selected for processing (7)

3rdparty/Megatron-LM
src/megatron/bridge/models/__init__.py
src/megatron/bridge/models/glm_moe_dsa/__init__.py
src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py
src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py
tests/functional_tests/models/glm_moe_dsa/__init__.py
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py

coderabbitai · 2026-03-20T04:05:15Z

3rdparty/Megatron-LM

@@ -1 +1 @@
-Subproject commit 9ed8b0c4ac8e65d80fb35eecb3656c96b30df943
+Subproject commit 0e7468950283aaa89d36c3ef105892db8dd00f56


⚠️ Potential issue | 🔴 Critical

Point the submodule at a published Megatron-LM commit.

Both reported pipelines fail during git submodule update with fatal: ... not our ref 0e7468950283aaa89d36c3ef105892db8dd00f56. As-is, the repo cannot fetch this revision, so the PR is not buildable or testable.

🧰 Tools

🪛 GitHub Actions: Build docs

[error] 1-1: Submodule update failed during 'git submodule update --init --force --depth=1 --recursive'. Git error: 'fatal: remote error: upload-pack: not our ref 0e74689'.

🪛 GitHub Actions: CICD NeMo

[error] 1-1: Submodule update failed: "fatal: remote error: upload-pack: not our ref 0e74689" during 'git -c protocol.version=2 submodule update --init --force --depth=1 --recursive'.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@3rdparty/Megatron-LM` at line 1, The PR points the 3rdparty/Megatron-LM submodule at an unreachable commit (0e7468950283aaa89d36c3ef105892db8dd00f56); update the submodule to a published, fetchable commit: check out a known public commit in the 3rdparty/Megatron-LM repo, update the gitlink in the superproject (stage and commit the change), and ensure .gitmodules (if modified) is consistent so git submodule update works in CI; reference the submodule path 3rdparty/Megatron-LM and replace the bad commit hash with the valid published commit hash in the superproject commit.

coderabbitai · 2026-03-20T04:05:15Z

src/megatron/bridge/models/__init__.py

+from megatron.bridge.models.glm_moe_dsa import (
+    GLM5Bridge,
+    GLM5ModelProvider,
 )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Show the unguarded import chain for the new GLM-5 exports. rg -n -C2 'from megatron\.bridge\.models\.glm_moe_dsa import|from transformers import GlmMoeDsaForCausalLM' \ src/megatron/bridge/models/__init__.py \ src/megatron/bridge/models/glm_moe_dsa/__init__.py \ src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 919

🏁 Script executed:

# Check how other optional models are handled in __init__.py # Look for try-except patterns and imports head -100 src/megatron/bridge/models/__init__.py | tail -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1374

🏁 Script executed:

# Search for try-except patterns around model imports in __init__.py rg -n 'try:|except|ImportError|ModuleNotFoundError' src/megatron/bridge/models/__init__.py -A 2 -B 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 259

🏁 Script executed:

# Check if glm_moe_dsa has an __init__.py that does the import cat src/megatron/bridge/models/glm_moe_dsa/__init__.py 2>/dev/null || echo "No __init__.py found"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 889

🏁 Script executed:

# Check how other transformers imports are handled in model bridges rg -n 'from transformers import' src/megatron/bridge/models/*/\*.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 144

🏁 Script executed:

# Check full structure of __init__.py for try-except guards around imports rg -n 'try:|except|ImportError' src/megatron/bridge/models/__init__.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 139

🏁 Script executed:

# Count total lines in __init__.py to understand scope wc -l src/megatron/bridge/models/__init__.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 113

🏁 Script executed:

# Search for transformers imports in model bridge files find src/megatron/bridge/models -name "*.py" -type f -exec grep -l "from transformers import" {} \; | head -10

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1280

🏁 Script executed:

# Look at the full __init__.py to check for try-except patterns cat src/megatron/bridge/models/__init__.py | grep -n "try\|except" | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 264

🏁 Script executed:

# Check pyproject.toml or requirements files for transformers version constraints find . -maxdepth 2 -name "pyproject.toml" -o -name "requirements*.txt" | xargs cat 2>/dev/null | grep -i transformers | head -10

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 105

🏁 Script executed:

# Check if GlmMoeDsaForCausalLM is mentioned in any config or doc rg -i "GlmMoeDsaForCausalLM" --type py --type md --type toml --type txt | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1184

🏁 Script executed:

# Check Megatron transformers requirement to see if GlmMoeDsaForCausalLM is available rg "transformers" pyproject.toml setup.py setup.cfg 2>/dev/null | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 120

🌐 Web query:

GlmMoeDsaForCausalLM transformers version history available since when

💡 Result:

GlmMoeDsaForCausalLM was added to Hugging Face Transformers via PR #43858 ("Add GlmMoeDsa"), which merged approximately 36 days before 2026-03-20, placing it around mid-February 2026. The model documentation source file indicates it was added to Transformers on 2026-02-08. It supports GLM-5 (zai-org/GLM-5), a 744B parameter MoE model released around 2026-02-09. The class is available in Transformers v5.3.0 and later, as evidenced by code references in the docs.

Citations:

1: https://huggingface.co/docs/transformers/main/model_doc/glm_moe_dsa

2: https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/glm_moe_dsa.md

3: https://app.semanticdiff.com/gh/huggingface/transformers/commit/33a8e68d4955c13cf6d6c629e76f12fc81974fc5

4: https://github.com/zai-org/GLM-5

🏁 Script executed:

# Check if there's any conditional import logic in test files that might hint at version requirements rg -n "GlmMoeDsaForCausalLM" tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py -B5 -A5 | head -30

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1393

🏁 Script executed:

# Check if any other model bridges use try-except for transformers imports rg -n "try:|except" src/megatron/bridge/models/*/\*_bridge.py 2>/dev/null | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

Guard the GLM-5 re-export behind version checking or raise the minimum transformers version to 5.3.0.

GlmMoeDsaForCausalLM was added to Transformers on 2026-02-08 and is only available in v5.3.0+. However, pyproject.toml allows transformers>=5.0.0, meaning users on 5.0.0–5.2.x will encounter ImportError: cannot import name 'GlmMoeDsaForCausalLM' when importing megatron.bridge.models, even if they never use GLM-5. Wrap the import in a try-except block or bump the minimum transformers version constraint to >=5.3.0.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/__init__.py` around lines 67 - 70, The module re-exports GLM-5 symbols unguarded causing ImportError on transformers<5.3.0; wrap the import of GLM5Bridge and GLM5ModelProvider from megatron.bridge.models.glm_moe_dsa in a try/except ImportError (catching the missing GlmMoeDsaForCausalLM) and on failure set those names to None or omit them from __all__ so importing megatron.bridge.models won't raise; alternatively update the package requirement to transformers>=5.3.0 in pyproject.toml if you prefer to enforce availability instead of guarding the import.

coderabbitai · 2026-03-20T04:05:16Z

src/megatron/bridge/models/__init__.py

 from megatron.bridge.models.llama_nemotron import (
+    Llama31Nemotron70BProvider,
+    Llama31NemotronNano8BProvider,
+    Llama31NemotronUltra253BProvider,
+    Llama33NemotronSuper49BProvider,
    LlamaNemotronBridge,
    LlamaNemotronHeterogeneousProvider,
 )


⚠️ Potential issue | 🟠 Major

Remove or re-export LlamaNemotronBridge.

It's still imported in this block, but it's no longer listed in __all__. That matches the flake8 F401 failure already reported for this file and leaves the intended top-level export surface ambiguous.

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 104-104: 'megatron.bridge.models.llama_nemotron.LlamaNemotronBridge' imported but unused

(F401)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/__init__.py` around lines 104 - 111, The import currently brings in LlamaNemotronBridge but it is not exported in __all__, causing a flake8 F401; either remove LlamaNemotronBridge from the import tuple in the megatron.bridge.models.llama_nemotron import block or add "LlamaNemotronBridge" to the module's __all__ list to explicitly re-export it; locate the import statement that lists Llama31Nemotron70BProvider, Llama31NemotronNano8BProvider, Llama31NemotronUltra253BProvider, Llama33NemotronSuper49BProvider, LlamaNemotronBridge, LlamaNemotronHeterogeneousProvider and update it accordingly so the import/export surface is consistent.

coderabbitai · 2026-03-20T04:05:16Z

src/megatron/bridge/models/__init__.py

 from megatron.bridge.models.nemotron_vl import (
+    NemotronNano12Bv2Provider,
    NemotronNano12Bv2VLModelProvider,
    NemotronVLBridge,
    NemotronVLModel,
 )
-from megatron.bridge.models.nemotronh import (
-    NemotronHBridge,
+from megatron.bridge.models.nemotronh.nemotron_h_provider import (
+    Nemotron3NanoProvider,
+    NemotronHModel4BProvider,
+    NemotronHModel8BProvider,
+    NemotronHModel47BProvider,
+    NemotronHModel56BProvider,
+    NemotronHModelProvider,
+    NemotronHModelProvider4B,
+    NemotronHModelProvider8B,
+    NemotronHModelProvider47B,
+    NemotronHModelProvider56B,
+    NemotronNano9Bv2Provider,
+    NemotronNano12Bv2Provider,
+    NemotronNanoModelProvider9Bv2,
+    NemotronNanoModelProvider12Bv2,
 )


⚠️ Potential issue | 🟠 Major

Resolve the NemotronNano12Bv2Provider name collision.

This module imports NemotronNano12Bv2Provider from both megatron.bridge.models.nemotron_vl and megatron.bridge.models.nemotronh.nemotron_h_provider. The second import overwrites the first, so the top-level export at Line 345 can only point at one provider and the other becomes unreachable.

Also applies to: 345-346

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/__init__.py` around lines 135 - 156, Two different modules export NemotronNano12Bv2Provider and the second import overwrites the first; fix by aliasing one or both imports (e.g., import NemotronNano12Bv2Provider from megatron.bridge.models.nemotron_vl as NemotronNano12Bv2ProviderVL and from megatron.bridge.models.nemotronh.nemotron_h_provider as NemotronNano12Bv2ProviderH), then update the top-level exports that currently reference NemotronNano12Bv2Provider to point to the correct alias (choose which provider should be exported under the public name) and update any internal references to use the new alias names (check usages of NemotronNano12Bv2Provider, NemotronVLBridge, and entries in the module's __all__ if present).

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

coderabbitai · 2026-03-20T04:05:16Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+    def maybe_modify_converted_hf_weight(
+        self,
+        task,
+        converted_weights_dict: dict[str, torch.Tensor],
+        hf_state_dict,
+    ) -> dict[str, torch.Tensor]:
+        """Merge per-expert Megatron weights back into fused HF tensors on export."""
+        if not isinstance(task.mapping, (GLMExpertGateUpProjMapping, GLMExpertDownProjMapping)):
+            return converted_weights_dict
+
+        if not converted_weights_dict:
+            return {}
+
+        num_experts = self._hf_config.n_routed_experts
+        ep_size = parallel_state.get_expert_model_parallel_world_size()
+        experts_per_rank = num_experts // ep_size
+
+        try:
+            local_expert_number = extract_expert_number_from_param(task.param_name) % experts_per_rank
+        except ValueError:
+            return converted_weights_dict
+
+        if not hasattr(self, "hf_weights_cache"):
+            self.hf_weights_cache = {}
+
+        for key, value in converted_weights_dict.items():
+            if key not in self.hf_weights_cache:
+                self.hf_weights_cache[key] = {}
+
+            if ep_size == 1:
+                self.hf_weights_cache[key][local_expert_number] = value
+            else:
+                if value.shape[0] != ep_size:
+                    raise ValueError(f"Expected EP dim {ep_size} for {key}, got {value.shape}.")
+                for i, exp_val in enumerate(value):
+                    global_expert_number = local_expert_number + (i * experts_per_rank)
+                    self.hf_weights_cache[key][global_expert_number] = exp_val
+
+            if len(self.hf_weights_cache[key]) == num_experts:
+                merged = torch.stack([self.hf_weights_cache[key][i] for i in range(num_experts)], dim=0)
+                if key in hf_state_dict:
+                    expected = hf_state_dict[key].shape
+                    if merged.shape != expected and merged.transpose(-1, -2).shape == expected:
+                        merged = merged.transpose(-1, -2).contiguous()
+                del self.hf_weights_cache[key]
+                return {key: merged}
+
+            return {}
+
+        return {}


⚠️ Potential issue | 🟠 Major

Drop the bridge-local fused-expert merge override.

GLMExpertGateUpProjMapping / GLMExpertDownProjMapping already opt into grouped export in src/megatron/bridge/models/conversion/param_mapping.py, and GLMExpertDownProjMapping already carries its transpose-on-export behavior. Re-merging here reintroduces per-bridge caching/order logic and risks double-stacking or misordering expert tensors on export.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` around lines 400 - 449, This method implements a bridge-local fused-expert merge that duplicates grouping/transpose behavior already handled by GLMExpertGateUpProjMapping/GLMExpertDownProjMapping and must be removed; replace the body of maybe_modify_converted_hf_weight so it no longer performs any per-expert caching/stacking or special-case mapping checks (remove all uses of self.hf_weights_cache and the if isinstance(... GLMExpertGateUpProjMapping/GLMExpertDownProjMapping) branch and the merging logic), and instead simply return converted_weights_dict unchanged (or the incoming dict) so grouped export/transpose behavior is driven solely by the mapping/param_mapping code (keep the method signature unchanged).

tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py

…move stale MTP Key fixes verified by clean 53/53 roundtrip on toy model: - Expert mappings: switch from non-existent fused `experts.gate_up_proj` to per-expert `experts.*.gate_proj/up_proj/down_proj` format (same as OlMoE). The HF model stores one tensor per expert, not a stacked fused tensor. - qk_layernorm=True: with False, DSA spec used IdentityOp so MCore had no q_layernorm/kv_layernorm params; setting True creates the modules that map to HF's q_a_layernorm.weight and kv_a_layernorm.weight, completing the roundtrip without missing tensors. - Remove MTP: HF GlmMoeDsaModel (transformers 5.3.0) ignores num_nextn_predict_layers and creates only num_hidden_layers transformer layers. MTP config and all MTP mappings removed. - Remove build_conversion_tasks None-filter override (was masking the real problem of missing/incorrect mappings). - Remove GLM5ModelProvider alias shim from glm5_provider.py; the bridge uses MLAModelProvider directly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

…e fails Some models (e.g. GLM-5) have a generation_config.json with fields that newer transformers versions reject as invalid (e.g. top_p set without do_sample=True). Instead of crashing the roundtrip, catch the ValueError in save_artifacts and copy the original file directly to preserve it. Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

yaoyu-33 · 2026-03-20T14:57:35Z

/ok to test 951dc36

yaoyu-33 · 2026-03-20T14:58:39Z

src/megatron/bridge/models/hf_pretrained/base.py

            artifact = getattr(self, name, None)
            if artifact is not None and hasattr(artifact, "save_pretrained"):
-                artifact.save_pretrained(save_path)
+                try:


revert this change, not convince it's required. If it's toy model issue, should fix in toy model

yaoyu-33 · 2026-03-20T15:00:55Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+    def provider_bridge(self, hf_pretrained: PreTrainedCausalLM) -> MLAModelProvider:
+        hf_config = hf_pretrained.config
+
+        configs = {


check other model? this part should use super to do

yaoyu-33 · 2026-03-20T15:01:32Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+    def mapping_registry(self) -> MegatronMappingRegistry:
+        mapping_list = []
+
+        param_mappings = {


just combine param mapping and layerspec mapping

Signed-off-by: yaoyu-33 <[email protected]>

yaoyu-33 · 2026-03-20T15:08:08Z

/ok to test 7d055e4

… yuya-add-glm5

yaoyu-33 and others added 30 commits March 6, 2026 09:31

renmove deprecated providers

cf6391d

Signed-off-by: yaoyu-33 <[email protected]>

test: skip nemotronh 4b ckpt tests and PP=2 conversion (pleasefixme f…

206c4fb

…or PR #2628) Signed-off-by: yaoyu-33 <[email protected]> Made-with: Cursor

Revert "test: skip nemotronh 4b ckpt tests and PP=2 conversion (pleas…

5cfc87b

…efixme for PR #2628)" This reverts commit 206c4fb.

ci: re-trigger CI

8d66144

Signed-off-by: yaoyu-33 <[email protected]>

[model] chore: merge main into refactor-fused-expert-mappings

46da10c

Keep once-per-class dtype mismatch warning from HEAD (suppresses duplicate warnings) over main's per-call version. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Merge branch 'main' into yuya/add-minimax-m2-bridge

c27268b

Signed-off-by: yaoyu-33 <[email protected]>

[ci] fix: Fix ruff lint failures

a24f701

- param_mapping.py: inline raise ValueError to satisfy line-length formatting - gpt_oss_bridge.py: remove unused GPTOSSProvider import (F401) Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Merge remote-tracking branch 'origin/main' into yuya/refactor-fused-e…

c4d4484

…xpert-mappings

Merge remote-tracking branch 'origin/yuya/refactor-fused-expert-mappi…

b85c549

…ngs' into yuya/add-minimax-m2-bridge

Merge origin/main into yuya/add-minimax-m2-bridge

d04a47e

Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

[build] chore: sync 3rdparty/Megatron-LM submodule to main

5b68f2f

Bring submodule pointer in line with origin/main (905c0e38). Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

Merge remote-tracking branch 'origin/main' into yuya/add-minimax-m2-b…

9f7b2b9

…ridge

yaoyu-33 and others added 10 commits March 18, 2026 12:07

[doc] fix: remove CONTAINER_MOUNTS from MiniMax-M2 README

29a5629

Not needed in our setup. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

[ci] feat: add L0_Launch_models_minimax_m2 to cicd-main.yml matrix

54346c8

Register the new L0 bash script in the cicd-functional-tests-l0 job matrix so it runs on every PR alongside the other L0 model tests. Signed-off-by: Yu Yao <[email protected]> Signed-off-by: yaoyu-33 <[email protected]>

[test] fix: save surrogate gpt2 tokenizer in minimax_m2 toy model fix…

c77d9cb

…ture Signed-off-by: yaoyu-33 <[email protected]>

yaoyu-33 marked this pull request as draft March 20, 2026 03:52

copy-pr-bot bot temporarily deployed to public March 20, 2026 03:54 Inactive

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

yaoyu-33 and others added 2 commits March 19, 2026 22:51

yaoyu-33 commented Mar 20, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to public March 20, 2026 15:02 Inactive

[build] fix: revert 3rdparty/Megatron-LM to main branch commit

7d055e4

Signed-off-by: yaoyu-33 <[email protected]>

Merge remote-tracking branch 'origin/yuya/add-minimax-m2-bridge' into…

a3b1b1b

… yuya-add-glm5

yaoyu-33 mentioned this pull request Mar 20, 2026

Add GLM5 support #2469

Closed

5 tasks

		@@ -1 +1 @@
		Subproject commit 9ed8b0c4ac8e65d80fb35eecb3656c96b30df943
		Subproject commit 0e7468950283aaa89d36c3ef105892db8dd00f56

Conversation

yaoyu-33 commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Related

GitHub Actions CI

Before your PR is "Ready for review"

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Mar 20, 2026

Uh oh!

yaoyu-33 commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yaoyu-33 commented Mar 20, 2026

Uh oh!

yaoyu-33 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yaoyu-33 commented Mar 20, 2026 •

edited by coderabbitai bot

Loading