Conversation
Greptile SummaryThis PR fixes a CMake configuration error that occurred when building TransformerEngine targeting only Blackwell architectures (e.g. The fix introduces Confidence Score: 5/5Safe to merge — the fix is correct, logically equivalent to prior behaviour for mixed builds, and properly handles the pure-Blackwell edge case. No P0/P1 issues found. The architectural separation (NVTE_STANDARD_ARCHS, NVTE_GENERIC_ARCHS, NVTE_SPECIFIC_ARCHS) is clean, the CUDA_ARCHITECTURES OFF guard is the standard CMake idiom for fully manual architecture control, and all source groups receive the correct flag sets. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[CMAKE_CUDA_ARCHITECTURES set by user or defaults] --> B{Contains arch 100/101/110/120?}
B -- Yes --> C[Remove from CMAKE_CUDA_ARCHITECTURES\nAdd to NVTE_GENERIC_ARCHS\nAdd variant to NVTE_SPECIFIC_ARCHS]
B -- No --> D
C --> D[NVTE_STANDARD_ARCHS = remaining CMAKE_CUDA_ARCHITECTURES]
D --> E[transformer_engine_cuda_sources]
D --> F[transformer_engine_cuda_arch_specific_sources]
E --> G[Apply NVTE_STANDARD_ARCHS flags\nApply NVTE_GENERIC_ARCHS flags]
F --> H[Apply NVTE_STANDARD_ARCHS flags\nApply NVTE_SPECIFIC_ARCHS flags]
G --> I[add_library transformer_engine\nCUDA_ARCHITECTURES OFF]
H --> I
Reviews (3): Last reviewed commit: "fix CUDA architectures cmake logic" | Re-trigger Greptile |
a3cf63e to
65eb86b
Compare
Signed-off-by: Gaetan Lepage <gaetan@glepage.com>
65eb86b to
8f526bd
Compare
|
Hi @GaetanLepage , there is another PR that targets this problem among other things #2665 (although I like your approach with just turning off the CMake CUDA arch handling much better than what I did in that PR). We could consolidate those changes I think. |
Description
Currently, building TE while targeting a single Blackwell architecture (e.g.
NVTE_CUDA_ARCHS=120) fails with:This is because,
CUDA_ARCHITECTURESis effectively empty once Blackwell architectures have been filtered out and sent toNVTE_GENERIC_ARCHSandNVTE_SPECIFIC_ARCHS.Type of change
Changes
Fix the cmake logic regarding cuda capabilities handling:
NVTE_STANDARD_ARCHSthat will contain all the pre-Blackwell archs (7.5, 8.0, ...)list(APPEND arch_compile_options "--generate-code=arch=compute_${arch},code=sm_${arch}")) for bothtransformer_engine_cuda_sourcesandtransformer_engine_cuda_arch_specific_sourcessource files.Checklist: