Skip to content

Add scripts+workflow to build and upload tarballs from artifacts#4448

Merged
ScottTodd merged 4 commits intoROCm:mainfrom
ScottTodd:multi-arch-build-tarballs
Apr 13, 2026
Merged

Add scripts+workflow to build and upload tarballs from artifacts#4448
ScottTodd merged 4 commits intoROCm:mainfrom
ScottTodd:multi-arch-build-tarballs

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd commented Apr 9, 2026

Motivation

We'd like to produce tarballs as part of multi-arch release pipelines. For context, see:

This will also enable building JAX packages as part of CI pipelines, see:

Technical Details

This downloads artifacts from a workflow run (current workflow run when included as part of CI/CD workflows, or a prior workflow for testing or repackaging) and then uploads them to an artifacts bucket (e.g. therock-dev-artifacts). Release workflows (to be added) can then choose to copy these tarballs to a tarballs bucket (e.g. therock-dev-tarball).

Important

The workflow is not yet integrated into any workflows via workflow_call. It is only run manually via workflow_dispatch.

Tarball files use substantial storage (2GB+ per tarball), so I'd like to only include this for release builds and opt-in for PRs that want to build JAX -- at least until KPACK_SPLIT_ARTIFACTS is flipped and we can produce a single "multiarch" tarball instead of separate tarballs per family.

Behavior with and without KPACK_SPLIT_ARTIFACTS

In this initial implementation,

Condition Behavior
KPACK_SPLIT_ARTIFACTS disabled Creates a single tarball per GPU family
KPACK_SPLIT_ARTIFACTS enabled Creates a single tarball per GPU target and a "multiarch" tarball with all GPU targets

We may later want to also produce tarballs without including test artifacts, produce larger groups independent of the current families like "all Radeon GPU targets", etc. All of that is just changes to the filtering and repackaging.

Downloading and extracting

This implementation runs a loop around:

python build_tools/artifact_manager.py fetch \
    --stage=all \  # artifacts from all stages (foundation,math-libs,etc.), all components (lib,doc,test,etc.)
    --amdgpu-families=${families_str} \  # filter to a single family
    --output-dir=${output_dir} \
    --flatten \  # extract and flatten into "dist" directory in one command
    --download-cache-dir=${download_cache_dir}  # reuse generic artifacts downloaded by prior calls

This has the advantage of being easy to reproduce outside of the script and reusing cached downloaded artifacts for local debugging and CI efficiency. We also considered fetching and not flattening, then using artifacts.py::ArtifactCatalog to repackage as build_python_packages.py does (using py_packaging.py), but this is simpler.

Compression

This implementation produces .tar.gz, matching existing tarball releases. Compression would be faster and more efficient using .tar.zst. I ran some benchmarks on my Windows dev machine:

Expand for benchmark results

Method Time (s) Size (MB) Ratio
tar-cfz 21.0 419.4 29.5% <- current default
gz-1 12.2 449.8 31.6%
gz-3 15.2 440.5 31.0%
gz-6 26.4 420.9 29.6%
gz-9 67.9 420.2 29.5%
zst-1 3.3 420.2 29.5% <- matches gz-6 ratio, 6x faster
zst-3 4.4 360.5 25.3% <- sweet spot
zst-6 8.0 343.9 24.2%
zst-9 10.0 317.9 22.3%
zst-19 197.9 199.4 14.0%

I did wrap compression in a ProcessPoolExecutor since parallel compression does make efficient use of CPU cores, sample benchmarks showing speedup (so not oversubscribed):

Expand for benchmark results

Workers Wall (s) Avg/job Speedup Efficiency
1 244.2 24.4 1.0x 103%
2 128.3 25.6 2.0x 98%
4 79.4 26.6 3.2x 79%
6 54.4 27.2 4.6x 77%
8 54.0 27.6 4.7x 58%
10 28.8 28.6 8.8x 88%

Test Plan

  • New unit tests for some logic
  • Tested locally with artifacts from prior workflow runs with and without KPACK_SPLIT_ARTIFACTS, artifacts were downloaded, packaged into the expected tarballs, and "uploaded" to a staging directory
  • Trigger the new workflow on my fork, check that the workflow succeeds (except for upload, missing credentials)

Test Result

Without KPACK_SPLIT_ARTIFACTS: https://github.com/ScottTodd/TheRock/actions/runs/24205988455/job/70661826987

Building tarballs for 2 families: gfx1151, gfx110X-all
  Platform: linux
  Version: 7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3
  Output: /home/runner/work/TheRock/TheRock/tarballs
...
Done. Tarballs in /home/runner/work/TheRock/TheRock/tarballs:
  therock-dist-linux-gfx110X-all-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2711.5 MB)
  therock-dist-linux-gfx1151-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2820.1 MB)
...
[INFO] Uploading to s3://therock-ci-artifacts-external/ScottTodd-TheRock/24205988455-linux/tarballs

With KPACK_SPLIT_ARTIFACTS: https://github.com/ScottTodd/TheRock/actions/runs/24217435275/job/70701188683

Building tarballs for 2 families: gfx1151, gfx1100
  Platform: linux
  Version: 7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3
  Output: /home/runner/work/TheRock/TheRock/tarballs
...
Done. Tarballs in /home/runner/work/TheRock/TheRock/tarballs:
  therock-dist-linux-gfx1100-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2891.5 MB)
  therock-dist-linux-gfx1151-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2907.4 MB)
  therock-dist-linux-multiarch-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (3085.2 MB)
...
[INFO] Uploading to s3://therock-ci-artifacts-external/ScottTodd-TheRock/24217435275-linux/tarballs

Submission Checklist

@ScottTodd
Copy link
Copy Markdown
Member Author

@ScottTodd ScottTodd requested a review from erman-gurses April 10, 2026 01:26
Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, added one concern - will do one more pass tomorrow.

ScottTodd and others added 4 commits April 10, 2026 10:19
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The upload path includes the platform ({run_id}-{platform}/tarballs/),
so the script needs to know the target platform rather than
auto-detecting from the current system. This matters when building
Windows tarballs on a Linux runner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ScottTodd ScottTodd force-pushed the multi-arch-build-tarballs branch from 54e29a9 to a763196 Compare April 10, 2026 17:28
@ScottTodd ScottTodd marked this pull request as ready for review April 10, 2026 17:43
@ScottTodd ScottTodd requested a review from erman-gurses April 10, 2026 17:43
Comment on lines +90 to +91
f"--amdgpu-families={families_str}",
"--expand-family-to-targets",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre this from #4449 is working as expected now, it expands gfx110X-all to gfx1100, gfx1101, gfx1102, gfx1103:

https://github.com/ScottTodd/TheRock/actions/runs/24255576558/job/70826158778

  python build_tools/build_tarballs.py \
    --run-id="24187929660" \
    --run-github-repo="ROCm/TheRock" \
    --dist-amdgpu-families="gfx110X-all;gfx1151" \

  ++ Downloading prim_test_gfx1100.tar.zst
  ++ Downloading prim_test_gfx1101.tar.zst
  ++ Downloading prim_test_gfx1102.tar.zst
  ++ Downloading prim_test_gfx1103.tar.zst

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems uploading fails due to missing credentials. In the setting I see

  github_repository: ScottTodd/TheRock
  is_pr_from_fork: False
  bucket: therock-ci-artifacts-external

Shouldn't is_pr_from_fork be True?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all working as intended. The run was workflow_dispatch in my fork, as I can't test .github/workflows/multi_arch_build_tarballs.yml from here in ROCm/TheRock until the workflow is included on a default branch here.

  • The repository is my fork
  • It's not a pull request
  • The artifacts-external bucket is used for any workflow run outside of ROCm/TheRock (push/pull_request/workflow_dispatch/etc.)

My fork does not have any credentials to upload or access to self-hosted runners, so the upload is expected to fail there.

Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ScottTodd ScottTodd merged commit 5c8ed1b into ROCm:main Apr 13, 2026
123 of 129 checks passed
@ScottTodd ScottTodd deleted the multi-arch-build-tarballs branch April 13, 2026 16:54
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants