Summary
torch.compile fails on first forward for an MoE model with a TorchInductor internal assertion when running uv run sft @ configs/debug/moe/sft/train.toml
Error:
torch._inductor.exc.InductorError: AssertionError: failed OrderedSet([]) >= OrderedSet([u8]) (inductor >= fx)
FX node mentioned:
aten.slice.Tensor(..., 0, 0, %sym_sum)
Environment
- PyTorch: 2.10.0+cu128
- Python: 3.12.6
- CUDA (driver/runtime from nvidia-smi): 12.8
- NVIDIA driver: 570.86.15
- GPU: NVIDIA A100-SXM4-40GB (reproduced on single GPU; machine has 8x A100)
What I was doing
I was running SFT on a debug GLM 0.5b MoE causal LM model with torch.compile enabled (Inductor backend via torch.compile).
The failure occurs on the first forward pass after model init.
Disabling torch.compile makes the same run succeed.
Behavior
- With compile enabled: crashes in Inductor during graph lowering/compile
- With compile disabled: training runs successfully
Help will be appreciated, thank you!