Skip to content

Conversation

@vmarkovtsev
Copy link

There are two changes:

  1. import nvidia returns a namespace package with __file__ equal to None
  2. Add the way to force headers from nvidia wheels. Without that envvar, it's practically impossible with CUDA installed system-wide.

I successfully built the package with torch using the following uv configuration:

[tool.uv.extra-build-dependencies]
"transformer-engine-torch" = [
    "ninja",
    "nvidia-cuda-crt==13.0.88",
    "nvidia-cuda-cccl==13.0.85",
    { requirement = "torch", match-runtime = true },
    { requirement = "pytorch-triton", match-runtime = true },
    { requirement = "nvidia-cusolver", match-runtime = true },
    { requirement = "nvidia-curand", match-runtime = true },
    { requirement = "nvidia-cublas", match-runtime = true },
    { requirement = "nvidia-cusparse", match-runtime = true },
    { requirement = "nvidia-cudnn-cu13", match-runtime = true },
    { requirement = "nvidia-nvtx", match-runtime = true },
    { requirement = "nvidia-cuda-nvrtc", match-runtime = true },
    { requirement = "nvidia-cuda-runtime", match-runtime = true },
]

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

There are two changes:
1. `import nvidia` returns a namespace package with `__file__` equal to `None`
2. Add the way to force headers from nvidia wheels. Without that envvar, it's practically impossible with CUDA installed system-wide.

I successfully built the package with torch using the following `uv` configuration:
```
[tool.uv.extra-build-dependencies]
"transformer-engine-torch" = [
    "ninja",
    "nvidia-cuda-crt==13.0.88",
    "nvidia-cuda-cccl==13.0.85",
    { requirement = "torch", match-runtime = true },
    { requirement = "pytorch-triton", match-runtime = true },
    { requirement = "nvidia-cusolver", match-runtime = true },
    { requirement = "nvidia-curand", match-runtime = true },
    { requirement = "nvidia-cublas", match-runtime = true },
    { requirement = "nvidia-cusparse", match-runtime = true },
    { requirement = "nvidia-cudnn-cu13", match-runtime = true },
    { requirement = "nvidia-nvtx", match-runtime = true },
    { requirement = "nvidia-cuda-nvrtc", match-runtime = true },
    { requirement = "nvidia-cuda-runtime", match-runtime = true },
]
```

Signed-off-by: Vadim Markovtsev <[email protected]>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 26, 2026

Greptile Overview

Greptile Summary

This PR enables building TransformerEngine with headers from nvidia wheels by making two focused changes to the build system. First, it adds support for namespace packages where nvidia.__file__ is None by falling back to nvidia.__path__[0]. Second, it introduces the NVTE_BUILD_USE_NVIDIA_WHEELS environment variable to force using nvidia wheel headers even when CUDA is installed system-wide.

Key changes:

  • Handle namespace packages correctly when nvidia.__file__ is None
  • Add NVTE_BUILD_USE_NVIDIA_WHEELS environment variable to force wheel-based header usage
  • Maintain backward compatibility with existing build configurations

The changes are minimal, focused, and properly handle the edge case of namespace packages. The logic correctly prioritizes toolkit includes unless explicitly overridden.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk
  • The changes are focused on build system improvements with proper fallback handling for namespace packages. The logic is straightforward and maintains backward compatibility. However, lacks tests to verify the namespace package handling works correctly in practice.
  • No files require special attention

Important Files Changed

Filename Overview
build_tools/utils.py Added namespace package support for nvidia wheels and environment variable to force wheel usage; changes are focused and handle edge cases properly

Sequence Diagram

sequenceDiagram
    participant Build as Build System
    participant Utils as build_tools/utils.py
    participant Env as Environment
    participant NV as nvidia package
    participant FS as File System

    Build->>Utils: get_cuda_include_dirs()
    Utils->>Env: Check NVTE_BUILD_USE_NVIDIA_WHEELS
    
    alt force_wheels=true OR toolkit not available
        Utils->>NV: import nvidia
        alt nvidia.__file__ is not None
            NV-->>Utils: Return __file__
            Utils->>FS: cuda_root = Path(__file__).parent
        else nvidia.__file__ is None (namespace package)
            NV-->>Utils: Return __path__[0]
            Utils->>FS: cuda_root = Path(__path__[0])
        end
        Utils->>FS: Iterate subdirs with include/
        FS-->>Utils: Return list of include dirs
    else toolkit available AND not force_wheels
        Utils->>Utils: cuda_toolkit_include_path()
        Utils-->>Build: Return toolkit include path
    end
    
    Utils-->>Build: Return CUDA include directories
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@ptrendx ptrendx requested a review from ksivaman January 27, 2026 00:17
@ptrendx ptrendx added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Jan 27, 2026
@cyanguwa
Copy link
Collaborator

/te-ci L0

@vmarkovtsev
Copy link
Author

It looks like the red jobs are unrelated timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants