-
Notifications
You must be signed in to change notification settings - Fork 684
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Pyt][Common] Enabling/Guarding sm120 support (non - attention)
#2833
opened Apr 3, 2026 by
KshitijLakhani
•
Draft
13 tasks
Add capture_time_hooks to make_graphed_callables for non-capturable per-callable hooks
#2831
opened Apr 3, 2026 by
buptzyb
Loading…
1 of 13 tasks
[Common] Reduced padding kernel compilation time
#2827
opened Apr 2, 2026 by
Oleg-Goncharov
Loading…
5 of 13 tasks
fix(CP, MLA): CP works fine with MLA in a2a cp_comm_type
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2826
opened Apr 2, 2026 by
zhujian19891203
Loading…
5 of 13 tasks
fix(CP, FA): the conditional logic in the FA version contains a vulnerability when processing the output of Flash Attn forward pass
#2825
opened Apr 2, 2026 by
zhujian19891203
Loading…
5 of 13 tasks
Parallel Test Execution to decrease CI run times
#2824
opened Apr 2, 2026 by
sudhakarsingh27
•
Draft
[Common] Fix fused router for large top-K and expert counts
#2821
opened Apr 1, 2026 by
harryzhou2000
Loading…
7 of 13 tasks
Refactor Amax Kernel ldmatrix loads, TMA/compute barriers, swizzle_idx
#2820
opened Apr 1, 2026 by
cael-ling
Loading…
6 of 13 tasks
[PyTorch] [torch.compile] Split linear forward into forward and setup context.
#2811
opened Mar 30, 2026 by
pggPL
Loading…
8 of 13 tasks
Streamline group Hadamard ComputeKernel loads
#2810
opened Mar 29, 2026 by
cael-ling
Loading…
5 of 13 tasks
Single __syncthreads per stage in GroupHadamardAmaxTmaKernel
#2809
opened Mar 29, 2026 by
cael-ling
Loading…
8 of 13 tasks
Precomputed swizzle_idx into group Hadamard ComputeKernel
#2808
opened Mar 29, 2026 by
cael-ling
Loading…
8 of 13 tasks
[PyTorch][Flash Attn] Add fallback import for FA3
#2806
opened Mar 26, 2026 by
eattia-nvidia
Loading…
7 of 13 tasks
[PyT] Fix FSDP2 memory leaks for FP8 weight workspaces and transpose caches
#2805
opened Mar 26, 2026 by
pstjohn
Loading…
3 tasks done
[PyTorch] [CI] Capture subprocess stderr in distributed tests for better CI error re…
#2802
opened Mar 25, 2026 by
sudhakarsingh27
Loading…
13 tasks
[JAX] Warmup FFIs with "initialize" stage
#2800
opened Mar 25, 2026 by
jberchtold-nvidia
Loading…
1 of 13 tasks
Previous Next
ProTip!
no:milestone will show everything without a milestone.