Skip to content

fix CUDA architectures cmake logic#2832

Open
GaetanLepage wants to merge 1 commit intoNVIDIA:mainfrom
GaetanLepage:fix-cuda-arch-cmake-logic
Open

fix CUDA architectures cmake logic#2832
GaetanLepage wants to merge 1 commit intoNVIDIA:mainfrom
GaetanLepage:fix-cuda-arch-cmake-logic

Conversation

@GaetanLepage
Copy link
Copy Markdown
Contributor

@GaetanLepage GaetanLepage commented Apr 3, 2026

Description

Currently, building TE while targeting a single Blackwell architecture (e.g. NVTE_CUDA_ARCHS=120) fails with:

-- Configuring done (2.5s)
CMake Error in CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "transformer_engine".

This is because, CUDA_ARCHITECTURES is effectively empty once Blackwell architectures have been filtered out and sent to NVTE_GENERIC_ARCHS and NVTE_SPECIFIC_ARCHS.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Fix the cmake logic regarding cuda capabilities handling:

  • Introduce NVTE_STANDARD_ARCHS that will contain all the pre-Blackwell archs (7.5, 8.0, ...)
  • Inject the relevant flag (list(APPEND arch_compile_options "--generate-code=arch=compute_${arch},code=sm_${arch}")) for both transformer_engine_cuda_sources and transformer_engine_cuda_arch_specific_sources source files.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR fixes a CMake configuration error that occurred when building TransformerEngine targeting only Blackwell architectures (e.g. NVTE_CUDA_ARCHS=120). After all architectures were filtered into NVTE_GENERIC_ARCHS / NVTE_SPECIFIC_ARCHS, CMAKE_CUDA_ARCHITECTURES was left empty, causing CMake to abort with "CUDA_ARCHITECTURES is empty for target transformer_engine".

The fix introduces NVTE_STANDARD_ARCHS (the remaining pre-Blackwell list after Blackwell entries are extracted), feeds it into explicit per-source --generate-code compile options for both source groups, and disables CMake's automatic architecture injection via CUDA_ARCHITECTURES OFF. This is logically equivalent to the previous behavior for mixed builds and correctly handles pure-Blackwell builds.

Confidence Score: 5/5

Safe to merge — the fix is correct, logically equivalent to prior behaviour for mixed builds, and properly handles the pure-Blackwell edge case.

No P0/P1 issues found. The architectural separation (NVTE_STANDARD_ARCHS, NVTE_GENERIC_ARCHS, NVTE_SPECIFIC_ARCHS) is clean, the CUDA_ARCHITECTURES OFF guard is the standard CMake idiom for fully manual architecture control, and all source groups receive the correct flag sets.

No files require special attention.

Important Files Changed

Filename Overview
transformer_engine/common/CMakeLists.txt Introduces NVTE_STANDARD_ARCHS for pre-Blackwell architectures, adds explicit per-source compile flags for standard archs, and sets CUDA_ARCHITECTURES OFF to prevent CMake's empty-list error when only Blackwell targets are specified.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CMAKE_CUDA_ARCHITECTURES set by user or defaults] --> B{Contains arch 100/101/110/120?}
    B -- Yes --> C[Remove from CMAKE_CUDA_ARCHITECTURES\nAdd to NVTE_GENERIC_ARCHS\nAdd variant to NVTE_SPECIFIC_ARCHS]
    B -- No --> D
    C --> D[NVTE_STANDARD_ARCHS = remaining CMAKE_CUDA_ARCHITECTURES]
    D --> E[transformer_engine_cuda_sources]
    D --> F[transformer_engine_cuda_arch_specific_sources]
    E --> G[Apply NVTE_STANDARD_ARCHS flags\nApply NVTE_GENERIC_ARCHS flags]
    F --> H[Apply NVTE_STANDARD_ARCHS flags\nApply NVTE_SPECIFIC_ARCHS flags]
    G --> I[add_library transformer_engine\nCUDA_ARCHITECTURES OFF]
    H --> I
Loading

Reviews (3): Last reviewed commit: "fix CUDA architectures cmake logic" | Re-trigger Greptile

@GaetanLepage GaetanLepage force-pushed the fix-cuda-arch-cmake-logic branch from a3cf63e to 65eb86b Compare April 3, 2026 16:06
Signed-off-by: Gaetan Lepage <gaetan@glepage.com>
@GaetanLepage GaetanLepage force-pushed the fix-cuda-arch-cmake-logic branch from 65eb86b to 8f526bd Compare April 3, 2026 16:10
@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented Apr 3, 2026

Hi @GaetanLepage , there is another PR that targets this problem among other things #2665 (although I like your approach with just turning off the CMake CUDA arch handling much better than what I did in that PR). We could consolidate those changes I think.

@ptrendx ptrendx added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants