[CI] Add GitHub workflow for building and releasing fat wheels by tongke6 · Pull Request #91 · inclusionAI/cuLA

tongke6 · 2026-06-12T03:35:38Z

📌 Description

Replace the monolithic cula.cudac extension with per-arch extensions (cula._cudac_sm90, cula._cudac_sm100) so that SM90 and SM100/SM103 kernels are compiled independently with their own -gencode flags. This enables building fat-binary wheels containing all architectures without needing the target GPU present at build time.

Key changes:

Split pybind.cu into per-file PYBIND11_MODULE definitions
Add cula/cudac.py proxy module for backwards-compatible imports
Add CULA_BUILD_ALL_ARCHS=1 env var to enable all SM targets
Add --fat flag to build_wheel.sh for CI fat-binary builds
Pin dependency versions and use no-local-version scheme for reproducible wheel filenames
Use setuptools_scm for dynamic __version__
Document pre-built wheel installation in README

🔍 Related Issues

Fix #83

🚀 Pull Request Checklist

Thank you for contributing to cuLA! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing.

⚡ Performance

Reviewer Notes

…83) Replace the monolithic `cula.cudac` extension with per-arch extensions (`cula._cudac_sm90`, `cula._cudac_sm100`) so that SM90 and SM100/SM103 kernels are compiled independently with their own `-gencode` flags. This enables building fat-binary wheels containing all architectures without needing the target GPU present at build time. Key changes: - Split pybind.cu into per-file PYBIND11_MODULE definitions - Add `cula/cudac.py` proxy module for backwards-compatible imports - Add `CULA_BUILD_ALL_ARCHS=1` env var to enable all SM targets - Add `--fat` flag to build_wheel.sh for CI fat-binary builds - Pin dependency versions and use `no-local-version` scheme for reproducible wheel filenames - Use setuptools_scm for dynamic `__version__` - Document pre-built wheel installation in README

gemini-code-assist

Code Review

This pull request restructures the build system and CUDA extension packaging for cuLA to support separate per-architecture extensions (_cudac_sm100 and _cudac_sm90) and fat-binary builds. It introduces a lazy-loading proxy module (cula.cudac) to dynamically expose the compiled extension functions, and updates the versioning to use setuptools_scm. Feedback on these changes highlights a thread-safety vulnerability in the lazy-loading proxy that could cause race conditions during concurrent imports, and advises against strict version pinning of runtime dependencies in pyproject.toml to prevent dependency conflicts for downstream users.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copilot

Pull request overview

This PR refactors cuLA’s CUDA packaging so architecture-specific kernels are built as separate extensions (SM90 vs SM100/SM103) and adds CI automation to build and publish “fat” wheels (multi-arch) via GitHub Releases, while keeping import cula.cudac working via a Python proxy module.

Changes:

Split the monolithic CUDA extension into per-architecture CUDAExtensions and move PYBIND11_MODULE bindings into per-arch .cu entrypoints.
Add CULA_BUILD_ALL_ARCHS=1 support and a --fat option in the wheel build script for CI fat-binary builds.
Introduce a GitHub Actions workflow to build cu129/cu130 wheels for x86_64/aarch64 and attach them to GitHub Releases; update versioning + README install docs accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`.github/workflows/build-release.yml`	New CI workflow to build and upload CUDA-versioned wheel artifacts and draft a GitHub Release.
`setup.py`	Builds per-arch CUDA extensions (`cula._cudac_sm90`, `cula._cudac_sm100`) and adds `CULA_BUILD_ALL_ARCHS` behavior.
`csrc/api/pybind.cu`	Removes the monolithic binding module entrypoint.
`csrc/api/kda_sm90.cu`	Adds SM90-specific `PYBIND11_MODULE` bindings.
`csrc/api/kda_sm100.cu`	Adds SM100/SM103-specific `PYBIND11_MODULE` bindings.
`cula/cudac.py`	New proxy module to preserve `import cula.cudac` API across split extensions.
`cula/__init__.py`	Switches to setuptools_scm-generated runtime version when available.
`scripts/build_wheel.sh`	Adds `--fat` flag to set `CULA_BUILD_ALL_ARCHS=1` during wheel builds.
`README.md`	Documents installing pre-built wheels from GitHub Releases.
`pyproject.toml`	Uses `setuptools_scm` `no-local-version` scheme and keeps writing `cula/_version.py`.
`tests/conftest.py`	Minor formatting adjustment in test collection marker logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add double-checked locking to _CudacProxy._load() to prevent race conditions in multi-threaded environments. Raise a descriptive ImportError when no CUDA extensions can be loaded instead of silently producing AttributeError later.

The blanket `except ImportError: pass` swallowed the actual failure reason, making it impossible to diagnose missing shared libraries or build issues. Collect each extension's ImportError and include them in the raised message.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

tongke6 · 2026-06-17T15:19:41Z

/gemini review again

gemini-code-assist

Code Review

This pull request refactors the build system and CUDA extension loading mechanism of cuLA by splitting the monolithic cula.cudac extension into two separate per-architecture extensions (cula._cudac_sm100 and cula._cudac_sm90) and introducing a lazy-loading proxy module. It also adds support for building fat binary wheels. The review feedback points out that the CULA_SM100_ENABLED and CULA_SM103_ENABLED preprocessor macros were omitted in the new setup.py, which could lead to compilation failures. Additionally, it is recommended to simplify the proxy implementation using standard PEP 562 module-level functions and to broaden exception handling during dynamic imports to catch RuntimeError and OSError.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

icavan

LGTM, @zheyang0825 PTAL

Catch (ImportError, AttributeError, OSError) when scanning per-arch extensions: pybind11 modules commonly surface missing-symbol / ABI / libcudart failures as AttributeError or OSError rather than ImportError, so the prior narrow catch silently dropped one extension's failure when another succeeded, leaving its kernels missing without diagnostic. Emit a UserWarning naming each failing extension on partial failure (all-fail still raises ImportError), preserving the c955d47 intent of surfacing per-extension errors. Also document the load-once-per-process semantics in the module docstring.

Select the per-architecture CUDA extension from the active device compute capability instead of scanning every built extension. SM100/SM103 now load the SM100 extension, while SM90 loads the SM90 extension. This avoids exposing kernels from mismatched GPU architectures and reports clearer errors when the matching extension is missing or unsupported.

zheyang0825

LGTM

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread cula/cudac.py Outdated

Comment thread pyproject.toml

tongke6 added 2 commits June 12, 2026 11:38

fix ruff lint errors

7616737

revert version requirements changes

b148c52

tongke6 requested a review from Copilot June 12, 2026 03:53

Copilot started reviewing on behalf of tongke6 June 12, 2026 03:54 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread .github/workflows/build-release.yml Outdated

Comment thread cula/cudac.py Outdated

Comment thread cula/cudac.py Outdated

Comment thread setup.py Outdated

Comment thread README.md Outdated

tongke6 and others added 2 commits June 12, 2026 12:02

Surface per-extension import errors in cudac proxy

c955d47

The blanket `except ImportError: pass` swallowed the actual failure reason, making it impossible to diagnose missing shared libraries or build issues. Collect each extension's ImportError and include them in the raised message.

Copilot started work on behalf of tongke6 June 17, 2026 14:45 View session

Fix build-release matrix with DRY expression mapping

6ccb8fa

Copilot finished work on behalf of tongke6 June 17, 2026 14:48

tongke6 and others added 2 commits June 17, 2026 22:51

Apply suggestions from code review

f700810

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

e0a1e21

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

tongke6 commented Jun 17, 2026

View reviewed changes

Comment thread README.md

Copilot started work on behalf of tongke6 June 17, 2026 15:14 View session

tongke6 marked this pull request as ready for review June 17, 2026 15:14

tongke6 requested review from icavan and zheyang0825 June 17, 2026 15:14

Add README example for building fat wheels

ead4f7e

Copilot finished work on behalf of tongke6 June 17, 2026 15:15

gemini-code-assist Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread setup.py

Comment thread cula/cudac.py

icavan reviewed Jun 22, 2026

View reviewed changes

Comment thread setup.py

icavan reviewed Jun 22, 2026

View reviewed changes

Comment thread cula/cudac.py Outdated

icavan approved these changes Jun 22, 2026

View reviewed changes

tongke6 added 4 commits June 22, 2026 19:13

Build release wheels against manylinux_2_28

ceab48b

fix python 3.12 GLIBC compat problems on ubi8

430531b

install gcc13

a27443c

zheyang0825 approved these changes Jun 23, 2026

View reviewed changes

tongke6 merged commit 7b1e127 into main Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add GitHub workflow for building and releasing fat wheels#91

[CI] Add GitHub workflow for building and releasing fat wheels#91
tongke6 merged 14 commits into
mainfrom
tk/gh-workflow

tongke6 commented Jun 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tongke6 commented Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icavan left a comment

Uh oh!

zheyang0825 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tongke6 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

⚡ Performance

Reviewer Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tongke6 commented Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icavan left a comment

Choose a reason for hiding this comment

Uh oh!

zheyang0825 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tongke6 commented Jun 12, 2026 •

edited

Loading