[deps] split ci_docgpu CPU/GPU depsets#62596
[deps] split ci_docgpu CPU/GPU depsets#62596ans9868 wants to merge 4 commits intoray-project:masterfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
2e76459 to
8eff66b
Compare
|
Someone fixed this it seems like in master. Going to close this here. |
|
Ah I got confused; there isn't a solution to this on the master branch. |
Fixes the torch-spline-conv conflict introduced in PR ray-project#62485 by splitting the single ci_docgpu_depset into separate CPU and GPU variants: - ci_docgpu_cpu_depset: CPU-only with --index https://download.pytorch.org/whl/cpu - ci_docgpu_gpu_depset: GPU-only with --index https://download.pytorch.org/whl/cu128 Update Docker build files to reference the GPU lock only (docgpu_gpu_depset_py.lock). This follows the proven raydepsets pattern used by ci_ml_build_depset (CPU) and ci_ml_gpubuild_depset (GPU). Note: Lock file regeneration is blocked by a pre-existing etils version conflict (separate issue). Lock files will be committed once that is resolved. Closes ray-project#62595 Signed-off-by: Adel Nour <[email protected]>
4d69759 to
57dbd9a
Compare
aslonnie
left a comment
There was a problem hiding this comment.
@elliot-barn could you help review this?
Signed-off-by: Nour999 <[email protected]>
b2db55a to
ca1881f
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit ca1881f. Configure here.
| - py310 | ||
| - py312 | ||
| pre_hooks: | ||
| - ci/raydepsets/pre_hooks/remove-compiled-headers.sh 3.13 |
There was a problem hiding this comment.
CPU depset generates lock files nothing consumes
Low Severity
The ci_docgpu_cpu_depset_${PYTHON_SHORT} depset generates docgpu_cpu_depset_py${PYTHON_VERSION}.lock files, but no Dockerfile or wanda config references them — only the GPU variant is consumed by docgpu.build.Dockerfile and docgpu.build.wanda.yaml. This differs from the ci_ml pattern being followed, where both CPU and GPU lock files are consumed via BUILD_VARIANT. The CPU depset will cost CI time to compile in the follow-up lock generation without being used.
Reviewed by Cursor Bugbot for commit ca1881f. Configure here.
Adds missing CPU-only packages (jax, torchmetrics, torchtext, etils, etc.) to GPU depset via new docgpu_gpu_additions.txt file. This avoids merging full dl-cpu with dl-gpu in one compile, preventing torch-spline-conv conflict while ensuring GPU image has all needed packages. Updates ci_docgpu.depsets.yaml GPU variant to reference both: - python/requirements/ml/py313/dl-gpu-requirements.txt (GPU PyTorch/PyG) - python/requirements/ml/py313/docgpu_gpu_additions.txt (missing CPU-only packages) INCOMPLETE: GPU lock files (docgpu_gpu_depset_py3.10.lock, py3.12.lock) not yet generated. Docker build will fail until locks are committed in follow-up PR. Lock generation blocked by pre-existing etils version conflict (separate issue). Signed-off-by: Adel Nour <[email protected]>
docgpu Split Fix — Work in ProgressProblemPR #62485 merged CPU and GPU requirements into one depset, causing torch-spline-conv conflict (CPU plain version vs GPU cu128 variant cannot resolve in single compiler run). Initial ApproachSplit into two depsets with separate indices. But GPU depset using only Current SolutionAdded Statusjax versions now correct (jax==0.4.28 + jaxlib==0.4.28+cuda backend match). But several unknowns remain:
P.S. Lock file generation is separate issue. Will handle once etils resolved. Depending on my schedule I think I could fix this in 3-7 days. Feedback on this approach is welcome. This is trickier than I initially expected. more information about the full bug in the issue here: #62595 |
Generate four lock files completing the ci_docgpu depset split:
- docgpu_cpu_depset_py3.{10,12}.lock: CPU PyTorch wheels
- docgpu_gpu_depset_py3.{10,12}.lock: GPU cu128 wheels
Remove old undivided docgpu_depset_py3.{10,12}.lock.
Also improve docgpu_gpu_additions.txt: add python_version < '3.13'
guard to torchtext (no cp313 wheel exists for 0.18.0) and clarify
comments.
Validated:
- torch-spline-conv: +pt27cpu in CPU locks, +pt27cu128 in GPU locks
- etils: 1.5.2 on py3.10, 1.14.0 on py3.12
- jaxlib: 0.4.28+cuda12.cudnn89 in both GPU locks
Fixes ray-project#62595
Signed-off-by: Adel Nour <[email protected]>


Summary
Fixes the torch-spline-conv conflict in docgpu depset by splitting CPU and GPU variants into separate depsets with their respective PyTorch wheel indices.
What Changed
ci/raydepsets/configs/ci_docgpu.depsets.yaml: Split singleci_docgpu_depsetinto two:ci_docgpu_cpu_depset_${PYTHON_SHORT}: CPU-only with--index https://download.pytorch.org/whl/cpuci_docgpu_gpu_depset_${PYTHON_SHORT}: GPU-only with--index https://download.pytorch.org/whl/cu128ci/docker/docgpu.build.wanda.yaml(line 5): Updated lock reference to GPU variant (docgpu_gpu_depset_py$PYTHON.lock)ci/docker/docgpu.build.Dockerfile(line 7): Updated lock reference to GPU variant (docgpu_gpu_depset_py$PYTHON.lock)Why
PR #62485 introduced a single depset combining both CPU and GPU requirements, creating an unsolvable torch-spline-conv conflict:
Splitting into separate depsets with explicit indices (following the
ci_mlpattern) resolves this.Note
This PR includes the configuration and Docker file changes. Lock files will
be regenerated and committed in a separate follow-up PR because:
Lock file generation requires running
bazel run //ci/raydepsets:raydepsets -- build,which compiles all dependencies and exposes a pre-existing etils version conflict
(
etils==1.5.2in dl-cpu-requirements.txt vsetils==1.14.0in the constraint file).The architectural fix (config split) is complete and correct regardless of
lock file state. It can merge immediately while the etils conflict is resolved
separately.
This keeps the PR focused: architectural changes now, lock regeneration later
once etils is fixed.
Related
Closes #62595
Related to #62485
Found via: buildkite/precheck/dependencies/build: raydepsets: compile all dependencies [g8_s1] #60522