Skip to content

Add rocm_smi64 to library preloads for PyTorch wheels#4440

Merged
ethanwee1 merged 4 commits intomainfrom
users/ethanwee1/add-rocm-smi-preload
Apr 10, 2026
Merged

Add rocm_smi64 to library preloads for PyTorch wheels#4440
ethanwee1 merged 4 commits intomainfrom
users/ethanwee1/add-rocm-smi-preload

Conversation

@ethanwee1
Copy link
Copy Markdown
Contributor

@ethanwee1 ethanwee1 commented Apr 9, 2026

triggered https://github.com/ROCm/TheRock/actions/runs/24241363210, but the build failed with that setuptools 404 https://github.com/ROCm/TheRock/actions/runs/24247692319 issue. So now that the new build wheels are in, triggered https://github.com/ROCm/TheRock/actions/runs/24249015826 to see if we can point there. Update: BUILD SUCCEEDED
The original error snippet:
https://github.com/ROCm/TheRock/actions/runs/24242582747/job/70780812586

Run python ./external-builds/pytorch/run_pytorch_smoke_tests.py -- \

Error:  Failed to retrieve GPU info: ERROR:librocm_smi64.so.1: cannot open shared object file: No such file or directory

Error: Process completed with exit code 1.

PyTorch upstream cherry-picked pytorch#175648 which links
libtorch_hip.so against librocm_smi64.so. Without preloading
this library, `import torch` fails with:

  ImportError: librocm_smi64.so.1: cannot open shared object file

Register rocm_smi64 as a LibraryEntry in _dist_info.py and add it
to LINUX_LIBRARY_PRELOADS so _rocm_init.initialize() preloads it
before torch._C is imported.
@ScottTodd
Copy link
Copy Markdown
Member

Validation: https://github.com/ROCm/TheRock/actions/runs/24212931118

      Collecting setuptools>=70.2.0
        ERROR: HTTP error 404 while getting https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/setuptools-82.0.1-py3-none-any.whl (from https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/setuptools/)
      ERROR: Could not install requirement setuptools>=70.2.0 from https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/setuptools-82.0.1-py3-none-any.whl because of HTTP error 404 Client Error: Not Found for url: https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/setuptools-82.0.1-py3-none-any.whl for URL https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/setuptools-82.0.1-py3-none-any.whl (from https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/setuptools/)

may need #4412 to fix that

The dirname filter in fetch_object_names used square brackets `[...]`
instead of parentheses `(...)`, creating a single-element list that is
always truthy in Python (`[False]` is a non-empty list). This caused
.whl files from subdirectories (e.g. test-update-deps/) to leak into
the parent prefix index, generating download links that resolve to
non-existent paths and return 404.

Signed-off-by: Wang, Yanyao <[email protected]>
@ethanwee1 ethanwee1 marked this pull request as ready for review April 10, 2026 16:02
@ethanwee1 ethanwee1 requested a review from ScottTodd April 10, 2026 16:02
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please revert the unrelated changes before merge

@ethanwee1 ethanwee1 merged commit 1bd6d08 into main Apr 10, 2026
31 checks passed
@ethanwee1 ethanwee1 deleted the users/ethanwee1/add-rocm-smi-preload branch April 10, 2026 17:34
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants