Skip to content

Fix missing cutlass.utils.ampere_helpers on arm64#1948

Merged
JannikSt merged 2 commits intofeature/arm64-supportfrom
bugfix/cutlass-ampere-helpers
Mar 4, 2026
Merged

Fix missing cutlass.utils.ampere_helpers on arm64#1948
JannikSt merged 2 commits intofeature/arm64-supportfrom
bugfix/cutlass-ampere-helpers

Conversation

@JannikSt
Copy link
Member

@JannikSt JannikSt commented Mar 4, 2026

Summary

Fix ModuleNotFoundError: No module named 'cutlass.utils.ampere_helpers' crash on arm64 (GB200) trainer startup.

Root Cause

  • nvidia-cutlass-dsl 4.4.1 dropped ampere_helpers.py from cutlass.utils (now only ships hopper_helpers and blackwell_helpers)
  • flash-attn 2.8.3's flash_attn.cute unconditionally imports it at module level for the Sm80 forward kernel (flash_attn.cute.flash_fwd -> import cutlass.utils.ampere_helpers)
  • This import is NOT gated by GPU architecture, so it triggers even on GB200 (sm_100)
  • The file still exists in flashinfer's vendored copy of cutlass

Fix

Copy ampere_helpers.py from flashinfer's vendored cutlass into the nvidia-cutlass-dsl package during the Docker build (arm64 only).

Can be removed once flash-attn gates the import by architecture or nvidia-cutlass-dsl re-adds the file.


Note

Low Risk
A build-time, arm64-gated Dockerfile workaround that only affects container image contents; main risk is brittleness if upstream package paths change.

Overview
Adds an arm64-only Docker build workaround that copies cutlass.utils.ampere_helpers.py from flashinfer’s vendored CUTLASS into the installed nvidia-cutlass-dsl package inside the venv.

This prevents flash-attn/flash_attn.cute from crashing at import time with ModuleNotFoundError on arm64/GB200 when nvidia-cutlass-dsl 4.4.1 omits that module.

Written by Cursor Bugbot for commit e4f5326. This will update automatically on new commits. Configure here.

JannikSt added 2 commits March 3, 2026 22:43
flash-attn.cute imports cutlass.utils.ampere_helpers but
nvidia-cutlass-dsl 4.4.1 doesn't ship that module. Copy it
from flashinfer's vendored cutlass which has the file.
@JannikSt JannikSt merged commit 3d043e6 into feature/arm64-support Mar 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant