Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
6405 commits
Select commit Hold shift + click to select a range
d2bd9fa
Batch Invariance (#2308)
wdykas Dec 10, 2025
5ab481c
Remove flattened_range code paths for distributed optimizer checkpoin…
dimapihtar Dec 11, 2025
5a24ff3
update commit (#2631)
dimapihtar Dec 11, 2025
f67b7bd
tests: Disable grads test
ko3n1g Dec 12, 2025
44899aa
Create separate teacher Layer Spec in KD mode (#2429)
AAnoosheh Dec 12, 2025
6b186c1
Dynamic context | Re-add max_requests arg. (#2488)
lmcafee-nvidia Dec 12, 2025
2ab9253
Inference | Fix entangled request generations. (#2584)
lmcafee-nvidia Dec 12, 2025
3a9f086
fix gpt3_mcore_reruns_resume_check_grads (#2646)
dimapihtar Dec 12, 2025
f5daa16
Nemotron nano v2 vl changes for Megatron Bridge (#2078)
cuichenx Dec 12, 2025
4f700f7
[docs] Migrate docs to new Sphinx (#2489)
Phlip79 Dec 12, 2025
1c6f6eb
Add option to only log inference every N steps (#2637)
tdene Dec 12, 2025
0a59bea
[docs] Use autodoc2 and remove automodule (#2542)
Phlip79 Dec 12, 2025
845617a
add backward compatibility support for loading mcore 0.15 checkpoints…
dimapihtar Dec 12, 2025
12b4406
add offline eagle3 instructions to readme (#2246)
yeyu-nvidia Dec 13, 2025
fe7fb73
Only initialize symmetric memory when needed (#2665)
sidsingh-nvidia Dec 15, 2025
e869218
Simplify parameter sync for checkpoint save (#2344)
ananthsub Dec 15, 2025
4a9b4a2
Update docstrings for dataset (#2666)
Phlip79 Dec 15, 2025
597e88a
[Megatron-FSDP] Support both old and new DeviceMesh APIs. (#2575)
cspades Dec 15, 2025
ff45bd4
Enable hybrid tensor + expert + data parallelism in mcore inference (…
sidsingh-nvidia Dec 16, 2025
43a0c33
Fix failing functional tests (#2679)
sidsingh-nvidia Dec 16, 2025
5f5741d
M4 + Dist Checkpoint: Replace global parallel state with explicit gro…
dimapihtar Dec 16, 2025
4bdd7b1
fix deprecated decorator import (#2680)
dimapihtar Dec 16, 2025
36a9081
Added integration for Kitchen extensions' SDPA and FA implementations…
frsun-nvda Dec 16, 2025
bacd164
Inference | Add request only if no paused requests. (#2600)
lmcafee-nvidia Dec 16, 2025
e9082fd
Pipeline parallelism fix in RL and sequence packing rewriting (#2632)
jalbericiola Dec 16, 2025
815d86c
Add oncall rotation (#2622)
Phlip79 Dec 16, 2025
bdc362a
Upgrade GitHub Actions to latest versions (#2678)
salmanmkc Dec 16, 2025
2a8bcf0
docs: Adding documentation.md to cover building documentation. (#2683)
aschilling-nv Dec 16, 2025
cf39a4d
Add moe layer perf UT. (#2673)
Victarry Dec 16, 2025
732bb8d
[Megatron-FSDP] Build default FSDP DeviceMesh, and remove model arg f…
cspades Dec 16, 2025
ae774fe
[docs] Add ability to disable autodoc2 for local builds (#2669)
Phlip79 Dec 17, 2025
d944ef9
Fix oncall assignment (#2686)
Phlip79 Dec 17, 2025
5613ed0
docs(readme): update Latest News section (#2684)
sbhavani Dec 17, 2025
2485495
Update RNG sharding to include EP rank (#2658)
paul-gibbons Dec 17, 2025
0b8a2ff
Add CODEOWNER for API backwards compatibility check files (#2687)
pablo-garay Dec 17, 2025
72416d0
Mark API backwards compatibility checks as OPTIONAL (non-blocking) (#…
pablo-garay Dec 17, 2025
9288125
pip install uv during GH action (#2695)
Phlip79 Dec 17, 2025
32b9ee4
chore: rotate oncall schedule
github-actions[bot] Dec 17, 2025
ff4a622
Don't delete svcnvidia-nemo-ci team from oncall (#2703)
Phlip79 Dec 17, 2025
3d1a5c8
RL: Rollouts should be distributed over the regular data parallel gro…
sidsingh-nvidia Dec 17, 2025
d321026
Use pull_request_target and don't use uv (#2702)
Phlip79 Dec 17, 2025
94b4759
Optimize TE cudagraph input memory (#2392)
buptzyb Dec 18, 2025
c7e5489
ci(fix): Pin gojq to stable version (#2480)
ko3n1g Dec 18, 2025
f19b59e
NVLS - fused reduce-scatter + residual + rms-norm + all-gather kernel…
sidsingh-nvidia Dec 18, 2025
0170e70
Default UVM level to 0. (#2450)
lmcafee-nvidia Dec 18, 2025
0b13c98
docs: improve documentation organization and add additional guides (#…
sbhavani Dec 18, 2025
d81b37b
Revert "Default UVM level to 0. (#2450)" (#2713)
chtruong814 Dec 18, 2025
a6f822c
Add missing imports in no-triton fallback (#2711)
maanug-nv Dec 18, 2025
1503c33
Fixes for #2450. (#2714)
lmcafee-nvidia Dec 18, 2025
1a2257b
Add RL parameter to set parallel generation tasks (#2712)
tdene Dec 19, 2025
30694e0
Refit prep 3 (#2708)
wdykas Dec 19, 2025
000c2e2
chore: Add cudagraph codeowners (#2720)
ko3n1g Dec 19, 2025
703bc36
[docs] Add developer section to docs (#2717)
Phlip79 Dec 19, 2025
7f471d7
Fix UVM argument for RL (#2722)
tdene Dec 19, 2025
ddf691d
[dcos] Update docs title to Megatron Core (#2729)
Phlip79 Dec 20, 2025
4193f3a
remove fp16 assert in moe_grouped_gemm & EP (#2495)
HaochenYuan Dec 22, 2025
a057662
Improve ModelOpt paths & add more Nemotron/hybrid model support (#2131)
jenchen13 Dec 22, 2025
cfd980b
Add options to improve data loader initialization time, especially at…
asolergi-nv Dec 22, 2025
1c67e7e
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Dec 22, 2025
8ea3b8d
ci: Fix copy-pr-bot update (#2736)
ko3n1g Dec 22, 2025
5b1ef07
Add oncall to all new PRs (#2734)
Phlip79 Dec 22, 2025
cc1b0b5
Hsdp register submesh fix lifuz mirror (#2467)
tomlifu Dec 23, 2025
1febe9f
Adding stop word support (#2685)
shanmugamr1992 Dec 23, 2025
a477766
Fix oncall assign (#2737)
Phlip79 Dec 23, 2025
f5d4c3a
Add support for non-decode CUDA graphs for Mamba models (#2474)
santhnm2 Dec 23, 2025
0a77122
Update sequence packing case when dummy PackedSeqParams are used (#2743)
mathemakitten Dec 24, 2025
876a046
feat: manual registration mode for nccl-ub option when using megatron…
youngeunkwon0405 Dec 24, 2025
ede9ae4
chore: rotate oncall schedule
github-actions[bot] Dec 24, 2025
3cf7a63
Update oncall for next few weeks (#2748)
Phlip79 Dec 24, 2025
dd7c9f4
Prep work for migrating to types from ModuleSpec (#2668)
nschank Dec 24, 2025
2b343d7
feat(MoE): Refactor cuda_graph_scope (#1920)
buptzyb Dec 30, 2025
a2d7d67
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Dec 31, 2025
11c9680
Fix merge conflict in #1920 (#2781)
tdene Dec 31, 2025
40d590d
ci: Allow disabling external contributors (#2784)
chtruong814 Dec 31, 2025
6977db9
chore: rotate oncall schedule
github-actions[bot] Dec 31, 2025
f33e009
Reflect the changes made by #1920 in RL (#2780)
tdene Dec 31, 2025
852e791
Fix 2780 (#2791)
tdene Dec 31, 2025
1909eb2
Only assign oncall to main PRs (#2755)
Phlip79 Dec 31, 2025
52bf635
Ignore bot for oncall (#2756)
Phlip79 Dec 31, 2025
0e33828
Update PR message (#2778)
Phlip79 Dec 31, 2025
ccc9ad3
Explicitly zero out padding token outputs when using quantization sca…
santhnm2 Dec 31, 2025
a427c47
Synchronize total block count across pipeline parallel ranks (#2578)
santhnm2 Dec 31, 2025
7843a80
Optimize TE CUDA Graph capturing time (#2482)
buptzyb Jan 2, 2026
1eed1d2
Do a pass of typing fixes on transformer/ (#2766)
nschank Jan 2, 2026
939f520
moe: remove unused variable scale_up (#1670)
WineChord Jan 4, 2026
e8dbcf7
build: Pin down `nvidia-nvshmem-cu13` (#2798) (#2803)
ko3n1g Jan 4, 2026
278be15
DeepSeek V3 FSDP Fix for Precision-Aware Optimizer (#2466)
tomlifu Jan 5, 2026
5ab9294
Minor Fixes on Post-Training ModelOpt Examples (#2813)
ChenhanYu Jan 5, 2026
a56a0b0
fix(moe): Support HybridEP and reduce memory overhead for 1F1B A2A ov…
lhb8125 Jan 6, 2026
4a584cb
Inference memory test (#2724)
wdykas Jan 6, 2026
44e5efd
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jan 6, 2026
10561f9
Move batch invariance mode init to initialize.py (#2832)
santhnm2 Jan 6, 2026
de56227
Move full model init to cuda stream to avoid race condition leading t…
jstjohn Jan 6, 2026
5af715b
[docs] Cleanup homepage (#2823)
Phlip79 Jan 6, 2026
8a59fb5
[docs] Update oncall doc (#2822)
Phlip79 Jan 6, 2026
c2327e7
Make default for rerun_mode=disabled not terminate with non-fatal rer…
kwyss-nvidia Jan 7, 2026
5950971
Bugfix: ensure spawned persistent checkpoint worker sets its CUDA dev…
ankurv-nvidia Jan 7, 2026
41cedd6
Implementation of a more flexible optimizer/scheduler override system…
jstjohn Jan 7, 2026
64bc482
chore: rotate oncall schedule
github-actions[bot] Jan 7, 2026
144049d
ci(fix): PyPI upload (#2843)
ko3n1g Jan 7, 2026
ed5bc5c
ci(fix): Don't fail on empty var (#2850)
ko3n1g Jan 7, 2026
c64f227
Add RL support for MOEs (#2742)
jon-barker Jan 7, 2026
ae1a2f1
ci(fix): GH release version tag (#2854)
ko3n1g Jan 7, 2026
7e5e16b
ci(hotfix): Disable flaky test
ko3n1g Jan 7, 2026
4ba9f46
Reduce the scope of the side stream around DDP initialization (#2852)
jstjohn Jan 7, 2026
49dfee2
Manually update first oncall rotation (#2855)
Phlip79 Jan 8, 2026
5fa42ec
Remove flaky iteration time functional test (#2862)
buptzyb Jan 8, 2026
0cc98c4
Nccl gloo refit for RL (#2812)
wdykas Jan 8, 2026
8d6c604
build: Bump jet-client (#2876)
ko3n1g Jan 8, 2026
8b10a64
Change oncall team name (#2861)
Phlip79 Jan 8, 2026
c8ac1fe
Dynamic Inference | Evict and re-compute context requests. (#2738)
lmcafee-nvidia Jan 8, 2026
965cfd3
Fix CUDA RNG Tracker (#2641)
buptzyb Jan 8, 2026
65dccab
[main] feat(moe): Support attention output gate for Qwen3-Next (3/4) …
yuzhongw-nvidia Jan 9, 2026
bddd0a8
[main] feat(moe): Support moe shared expert gate for Qwen3-Next (2/4)…
yuzhongw-nvidia Jan 9, 2026
0b5be24
[docs] Fix docs and add generation doc (#2882)
Phlip79 Jan 9, 2026
43b4471
Revert "Dynamic Inference | Evict and re-compute context requests. (#…
chtruong814 Jan 9, 2026
8f2f700
FP8 params support for megatron-fsdp (MXFP8/Blockwise) (#2239)
kunlunl Jan 8, 2026
ed29157
docs: fix broken images, links, and typos across documentation (#2794)
sbhavani Jan 9, 2026
424a26d
ci(fix): Release version (#2873)
ko3n1g Jan 9, 2026
371ee52
Assign mcore-oncall instead of user (#2879)
Phlip79 Jan 9, 2026
74db1ce
tests: Disable Mamba MOE model test after 43b4471 (#2886)
ko3n1g Jan 9, 2026
7fe7f48
Fix mamba moe unit test after commit reversion (#2888)
jon-barker Jan 9, 2026
ed461d6
Improve error messages in mamba moe unit test (#2889)
jon-barker Jan 9, 2026
c0b2859
Use DynamicInferenceCoordinator for text generation server (#1910)
santhnm2 Jan 9, 2026
980a271
Fix inference server to make nemogym work. (#2887)
yobibyte Jan 9, 2026
4e0eac4
[training migration] add RNG config dataclass (#2347)
maanug-nv Jan 9, 2026
7ff3a2d
[training migration] Add RerunStateMachineConfig dataclass (#2436)
maanug-nv Jan 9, 2026
1f6edf0
Add retry loop with exponential backoff in dataloader as a form of in…
deepakn94 Jan 10, 2026
718d6c2
[training migration] Add SchedulerConfig dataclass (#2400)
maanug-nv Jan 10, 2026
1a244a5
RL: Fix cu_seqlens construction for PackedSeqParams (#2883)
mathemakitten Jan 10, 2026
f00fb9a
Revert "RL: Fix cu_seqlens construction for PackedSeqParams (#2883)"
ko3n1g Jan 12, 2026
8b4e00c
[MoE] Apply grouped gemm bias before unpadding for FP8 (#2817)
cuichenx Jan 12, 2026
dabc39b
[training migration] Add ProfilingConfig dataclass (#2393)
maanug-nv Jan 12, 2026
ba3fd4f
Update Slack user group when oncall changes (#2859)
Phlip79 Jan 12, 2026
4fc7935
Remove unused FlashAttention3 args (#2898)
santhnm2 Jan 12, 2026
438f2cf
Use different token for assign logic (#2893)
Phlip79 Jan 12, 2026
7b7e687
chore: Add `--no-container-mount-home` to script (#2906)
ko3n1g Jan 12, 2026
e068944
build: Bump deps (#2911)
ko3n1g Jan 12, 2026
6704731
feat: m4 leftover changes (#2506)
yaoyu-33 Jan 12, 2026
ce2cc40
Fix RL sequence packing bin size (#2909)
tdene Jan 12, 2026
3f6ad46
Revert "Remove unused FlashAttention3 args (#2898)" (#2916)
chtruong814 Jan 12, 2026
f967176
Revert "feat: m4 leftover changes (#2506)"
ko3n1g Jan 13, 2026
8d826c1
ci: Skip broken tests after dependency bump (#2934)
chtruong814 Jan 13, 2026
b71059a
build: Downgrade flashinfer
ko3n1g Jan 13, 2026
8918762
Reapply "feat: m4 leftover changes (#2506)"
ko3n1g Jan 14, 2026
b7a9d36
Reapply "Remove unused FlashAttention3 args (#2898)" (#2916)
ko3n1g Jan 14, 2026
d6af382
Revert "ci: Skip broken tests after dependency bump (#2934)"
ko3n1g Jan 14, 2026
2490e0c
Ko3n1g/build/downgrade flashinfer (#2937)
ko3n1g Jan 14, 2026
5241224
ci: Skip unit test cleanup (#2940)
chtruong814 Jan 14, 2026
fe87250
[MAIN][NVFP4][MOE] 128 Zero Padding for Grouped Quantization kernels …
zhongbozhu Jan 14, 2026
5548cdc
Add muon and layerwise distributed optimizer (#2241)
FDecaYed Jan 14, 2026
aa3f105
Support DDP overlap for models with repeated parameters (#2837)
deepakn94 Jan 14, 2026
82049aa
build: 26.02 dependency bump main (#2923)
ko3n1g Jan 14, 2026
69ba809
ci(hotfix): Update golden values after 26.02 bump
ko3n1g Jan 14, 2026
f55ec36
RL refit pipelining support (#2878)
wdykas Jan 14, 2026
409ebf1
Revert "[dev] Add assertion for mxfp8 params without dp overlap (#227…
ko3n1g Jan 14, 2026
667ac0f
ci(hotfix): Remove duplicated test
ko3n1g Jan 14, 2026
50fda17
ci(hotfix): Disable more throughput tests
ko3n1g Jan 14, 2026
0994ce2
gpt3_moe_mcore_te_ep8_resume_torch_dist_muon
ko3n1g Jan 14, 2026
9f341dc
gpt3_moe_mcore_te_ep8_resume_torch_dist_dist_muon
ko3n1g Jan 14, 2026
1ed4d52
gpt3_moe_mcore_te_tp2_pp2_ep4_etp1_no_mtp_no_a2a_ovlp_fine_grained_of…
ko3n1g Jan 14, 2026
6e6f0a0
gpt3_moe_mcore_te_tp2_pp2_ep4_etp1_fine_grained_offloading
ko3n1g Jan 14, 2026
c292652
gpt3_mcore_te_tp2_pp2_ep4_etp1_memory_speed
ko3n1g Jan 14, 2026
3d8549a
Revert "Reapply "Remove unused FlashAttention3 args (#2898)" (#2916)"
ko3n1g Jan 14, 2026
f3a17af
Unit test for model_provider to model_builder coupling (#2925)
AAnoosheh Jan 14, 2026
11debfa
add/update golden values
ko3n1g Jan 14, 2026
2dc8873
Various fixes to in-job restarter and better time accounting of start…
hexinw-nvidia Jan 14, 2026
5247a1f
feat(moe): Support placing MTP layers into standalone stages (#2136)
BestJuly Jan 14, 2026
7430688
gpt3_moe_mcore_te_ep8_resume_torch_dist_muon
ko3n1g Jan 14, 2026
ef15983
gpt3_mcore_te_tp2_pp2_ep4_etp1_memory_speed
ko3n1g Jan 14, 2026
6ed980d
gpt3_mcore_te_tp2_pp2_ep4_etp1_resume_torch_dist_attn_cudagraph
ko3n1g Jan 14, 2026
cdca7a6
ci: Onboard GB200 (#2847)
ko3n1g Jan 14, 2026
037b5f1
Install slack-sdk using uv (#2948)
Phlip79 Jan 14, 2026
fcb4d54
Inference | Evict overflow paused requests from context. (#2926)
lmcafee-nvidia Jan 14, 2026
a01c476
Enable training cudagraphs for RL (#2452)
mathemakitten Jan 14, 2026
cb1e9a8
Fix minor README wording and capitalization (#2928)
Deepak-J0shi Jan 14, 2026
411c3d8
Revert "Various fixes to in-job restarter and better time accounting …
ko3n1g Jan 14, 2026
463b428
ci: Restore grpo tests (#2952)
ko3n1g Jan 14, 2026
a0ab5a1
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jan 15, 2026
5542273
Fix GitHub GRPO resharding functional test (#2927)
tdene Jan 15, 2026
61c5839
gb200 & hybrid_static_inference_tp1_pp1_2B_logitsmatch
ko3n1g Jan 15, 2026
4bbcd72
cp: `ci(fix): GB200 racecondition (2962)` into `main` (#2963)
ko3n1g Jan 15, 2026
c3253bd
Revert "cp: `ci(fix): GB200 racecondition (2962)` into `main` (#2963)"
ko3n1g Jan 15, 2026
b97184d
feat(moe): Fine-grained activation offloading (#1913)
lhb8125 Jan 15, 2026
4f1e720
Add out-of-SLA link (#2903)
Phlip79 Jan 15, 2026
235da16
Fix broken mamba-moe unit test (#2970)
jon-barker Jan 15, 2026
4fa27a1
ci: Fix GB200 change (#2969)
ko3n1g Jan 15, 2026
a0908fc
Update golden values for reshard test (#2971)
tdene Jan 15, 2026
ad2ab2c
chore: Update golden values (#2973)
ko3n1g Jan 15, 2026
99ea287
ci(hotfix): `_TENSORBOARD_PATH`
ko3n1g Jan 15, 2026
4da7da9
Enable phase transition iterations (#2938)
jkamalu Jan 15, 2026
710ff97
gpt3_moe_mcore_te_tp2_pp2_ep4_etp1_no_mtp_no_a2a_ovlp_fine_grained_of…
ko3n1g Jan 16, 2026
048825c
add missing import in rl_utils.py (#2915)
jon-barker Jan 15, 2026
7f85843
Pass through --trust-remote-code and add this to all Nemotron model c…
ChenhanYu Jan 16, 2026
15ea236
Cuda 13 UVM (#2957)
wdykas Jan 16, 2026
1501c0f
[Main] Partial CUDA Graph support for EP Overlap (#2184)
Wohox Jan 16, 2026
03c0727
Add sequence packing support for hybrid model (#2913)
duncanriach Jan 16, 2026
55103f9
docs(megatron-fsdp): add Megatron-FSDP user guide (#2396)
xuwchen Jan 16, 2026
382eeea
DeepSeek V3.2 support (#2440)
kunlunl Jan 16, 2026
faa6037
fully remove zarr support (#2944)
dimapihtar Jan 16, 2026
b213e10
chore: Standardize setuptools version (#2975)
ko3n1g Jan 16, 2026
5df1954
ci: Run functional tests on main (#2983)
ko3n1g Jan 16, 2026
b96cadb
ci(fix): CI_COMMIT_BRANCH on forks (#2982)
ko3n1g Jan 16, 2026
20d66d5
[main] feat(moe): Support gated delta net for Qwen3-Next (1/4) (#1989)
yuzhongw-nvidia Jan 16, 2026
92f052b
ci: Add more gb200 nightly tests (#2981)
ko3n1g Jan 16, 2026
4ebd1ad
ci(hotfix): gpt3_mcore_te_tp1_pp2_resume_torch_dist_rope_embeddings
ko3n1g Jan 16, 2026
b50da25
ci(hotfix): move test to nightly
ko3n1g Jan 16, 2026
622a06a
[main] feat(moe): Support apply wd to qk layernorm for Qwen3-Next (4/…
yuzhongw-nvidia Jan 16, 2026
b305422
Set `token_dtype_code` init value in `GPTDatasetConfig` to fix CI (#2…
asolergi-nv Jan 16, 2026
c40b6ea
Re-submit "Various fixes to in-job restarter and better time accounti…
hexinw-nvidia Jan 17, 2026
100e11e
Use slack-sdk in a different manner (#2950)
Phlip79 Jan 17, 2026
b705148
Inference | Move `assert active_request_count > 0`. (#2958)
lmcafee-nvidia Jan 16, 2026
98d8c56
Hybrid Context Parallel Feature (#2282)
parthmannan Jan 17, 2026
829e461
[main] ci(moe): Add `--apply-wd-to-qk-layernorm` flag to the gdn test…
yuzhongw-nvidia Jan 19, 2026
a191791
ci: Disable step time on `gpt3_moe_mcore_te_tp2_pp2_ep4_etp1_no_mtp_n…
ko3n1g Jan 19, 2026
b49d810
ci: Fix workflows on main (#2990)
ko3n1g Jan 19, 2026
35129e7
Make Megatron-FSDP torch.compile compatible (#2425)
shjwudp Jan 20, 2026
517dfd4
[Megatron-FSDP] Test FP8 activations + parameter sharding with Megatr…
cspades Jan 20, 2026
2f3fa0f
chore: Escape special chars (#3014)
ko3n1g Jan 20, 2026
82ea022
Improve memory logging (#2839)
deepakn94 Jan 21, 2026
bcdd405
chore: rotate oncall schedule
github-actions[bot] Jan 21, 2026
a3615d7
Add a wrapper function for FA3 _flash_attn_forward call (#2933)
santhnm2 Jan 21, 2026
a1ba844
ci(hotfix): Fix unit tests on main
ko3n1g Jan 21, 2026
e4b18f7
ci(hotfix): Tests on main
ko3n1g Jan 21, 2026
ba876ef
chore: Set umask 0002 (#3027)
ko3n1g Jan 21, 2026
28c7221
Make attn mask inversion in-place instead of allocating it again (#3019)
mathemakitten Jan 21, 2026
ba456fd
[Megatron-FSDP] Fix incorrect gradient scaling target. (#3023)
cspades Jan 21, 2026
096dbeb
Ensure that last prefill chunk is handled correctly by Mamba models (…
santhnm2 Jan 21, 2026
90e685b
Replaces ModuleSpec with Protocols for some of the inputs to SelfAtte…
nschank Jan 21, 2026
dd72aee
Update oncall schedule (#3017)
Phlip79 Jan 21, 2026
8baf014
Various CUDA graph improvements on capture time, replay time, memory …
jiemingz Jan 22, 2026
bbbedbb
chore: rotate oncall schedule
github-actions[bot] Jan 22, 2026
8c768ea
Add script for batch running CI tests across distinct nodes (#3047)
jon-barker Jan 22, 2026
c8caeb2
Revert "Various CUDA graph improvements on capture time, replay time,…
ko3n1g Jan 22, 2026
db7a8a8
Refit EP support (#2972)
wdykas Jan 22, 2026
91ebe29
Catch case of negative tokens to generate (#2985)
tdene Jan 22, 2026
27168e2
ci: Disable broken GRPO tests
ko3n1g Jan 22, 2026
f85ad03
Sync GitHub and Slack teams (#3037)
Phlip79 Jan 22, 2026
7857383
Support custom Router implementations in MoELayer (#2891)
nschank Jan 23, 2026
cfa240a
ci: Remove Github transition comment from CI (#2881)
chtruong814 Jan 23, 2026
8db8d08
ci: Override N_REPEAT (#3051)
ko3n1g Jan 23, 2026
0136876
Supporting inference when called within an asyncio loop (#2816)
shanmugamr1992 Jan 23, 2026
03e0915
Update type hints and doc strings for moe_utils.py (#2821)
JavaZeroo Jan 23, 2026
10c6f01
Remove calculation of padding token in moe routing loss (#2142)
HaochenYuan Jan 23, 2026
029f48f
Bug fix with --no-use-tokenizer-from-checkpoint-args (#3049)
jon-barker Jan 23, 2026
0683679
Revert "Bug fix with --no-use-tokenizer-from-checkpoint-args (#3049)"…
thomasdhc Jan 23, 2026
93567e8
Add health endpoint to dynamic text gen server (#3009)
santhnm2 Jan 23, 2026
3593301
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jan 24, 2026
30dea5d
ci: Skip test_precision_aware_optimizer (#3062)
thomasdhc Jan 23, 2026
485ed18
Support multimodule communication (#2031)
yaoyu-33 Jan 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
max-line-length = 100
extend-ignore = E203,E501,F401,E402,E714
per-file-ignores = __init__.py:F401
59 changes: 59 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/distrib_optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/quantization-and-inference

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs

.gitlab/ @NVIDIA/ci
.github/ @NVIDIA/ci
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci @pablo-garay
scripts/README_API_COMPAT.md @NVIDIA/ci @pablo-garay
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci @pablo-garay
docs/api-backwards-compatibility-check.md @NVIDIA/ci @pablo-garay
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci @pablo-garay

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Bug report
about: Create a report to help us improve the repository or project
title: ""
labels: bug
assignees: ''

---

**Describe the bug**

A clear and concise description of what the bug is. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.


**Expected behavior**

A clear and concise description of what you expected to happen.


**Additional context**

Add any other context about the problem here.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: false

23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: Feature request
about: Suggest an idea for this project
title: ""
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
13 changes: 13 additions & 0 deletions .github/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
name: QUESTION
about: Ask a question about Megatron-LM that is not a bug, regression or enhancement
request
title: "[QUESTION]"
labels: ''
assignees: ''

---

**Your question**
Ask a clear and concise question about Megatron-LM. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.
40 changes: 40 additions & 0 deletions .github/ISSUE_TEMPLATE/regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: REGRESSION
about: Report a regression in speed or accuracy due to a Megatron-LM update
title: "[REGRESSION]"
labels: ''
assignees: ''

---

**Describe the regression**
A clear and concise description of what the regression is. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**To Reproduce**
Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.

**Previous performance**
What speed or accuracy did you previously see.

**New performance**
What speed or accuracy do you see after the update.

**Stack trace/logs**
If applicable, add the stack trace or logs related to the regression.

**Environment (please complete the following information):**
- Previous Megatron-LM commit ID
- New Megatron-LM commit ID
- Previous PyTorch version
- New PyTorch version
- Previous CUDA version
- New CUDA version
- Previous NCCL version
- New NCCL version

**Proposed fix**
If you have a proposal for how to fix the issue state it here or link to a PR.

**Additional context**
Add any other context about the problem here.
Loading