Skip to content

Elliot barn/build cp314 macos wheel#62622

Draft
elliot-barn wants to merge 3 commits intoreleases/2.55.0from
elliot-barn/build-cp314-macos-wheel
Draft

Elliot barn/build cp314 macos wheel#62622
elliot-barn wants to merge 3 commits intoreleases/2.55.0from
elliot-barn/build-cp314-macos-wheel

Conversation

@elliot-barn
Copy link
Copy Markdown
Contributor

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

The 2.55.0 release branch was missing Python 3.14 in the macOS wheel
build script. This temp branch builds only the cp314 wheel so it can
be copied to the release S3 path.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@elliot-barn elliot-barn changed the base branch from master to releases/2.55.0 April 14, 2026 23:28
TRAVIS_COMMIT will be set to the release SHA (58af3fc) so the wheel
has the correct ray.__commit__, but BUILDKITE_COMMIT will be this
temp branch's SHA which would fail the validation check.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an experimental centralized capacity queue request router for Ray Serve, featuring a new CapacityQueue actor and a CapacityQueueRouter. It also refactors Serve's autoscaling delay logic to utilize wall-clock timestamps for improved consistency and implements exponential backoff for ACTOR_UNAVAILABLE task errors. Additionally, the Ray version is updated to 2.55.0 across the project. Feedback on the experimental router implementation identifies several critical bugs, including missing methods and incorrect attribute access that would lead to runtime errors, as well as performance concerns regarding actor discovery and queue management.

I am having trouble creating individual review comments. Click here to see my feedback.

python/ray/serve/experimental/capacity_queue_router.py (62)

high

To perform $O(1)$ lookups of RunningReplica objects in _choose_replica_for_request, you should maintain a mapping from ReplicaID to RunningReplica in addition to the UID index.

        self._uid_to_replica_id: Dict[str, ReplicaID] = {}
        self._replica_map: Dict[ReplicaID, RunningReplica] = {}

python/ray/serve/experimental/capacity_queue_router.py (116)

high

The base class super().update_replicas(replicas) stores a list of RunningReplica objects in self._replicas. RunningReplica does not have a unique_id attribute directly; it is nested under replica_id. Also, you should populate the _replica_map suggested earlier.

        self._uid_to_replica_id = {r.replica_id.unique_id: r.replica_id for r in self._replicas}
        self._replica_map = {r.replica_id: r for r in self._replicas}

python/ray/serve/experimental/capacity_queue_router.py (169)

high

The _backoff method is called here and in other places but is not defined in this class or its base classes. This will result in an AttributeError at runtime. You should implement this method, likely using asyncio.sleep and the backoff parameters from self._config.

python/ray/serve/experimental/capacity_queue_router.py (212)

high

In the base RequestRouter class, self._replicas is a list, which does not have a .get() method. This will raise an AttributeError. Use the _replica_map suggested in update_replicas for $O(1)$ lookup.

                replica = self._replica_map.get(rid) if rid is not None else None

python/ray/serve/experimental/capacity_queue.py (247)

medium

Reconstructing the entire _waiters deque on every timeout or cancellation is an $O(N)$ operation that can block the actor's event loop under high load. Since _fulfill_waiters already skips completed futures at the head of the queue, you can rely on it for lazy cleanup. If you need to keep the deque size in check, consider a more efficient periodic cleanup or only filtering when the number of completed futures exceeds a threshold.

python/ray/serve/experimental/capacity_queue_router.py (100)

medium

Using ray.util.list_named_actors(all_namespaces=True) is an expensive operation that should be avoided in the request path. Since Serve deployment actor names are deterministic, you can construct the full name using self._deployment_id and self._capacity_queue_actor_name and use ray.get_actor directly. This would be much more efficient and reliable.

- Export MACOSX_DEPLOYMENT_TARGET=12.0 so the wheel is tagged
  macosx_12_0_arm64 to match the other release wheels.
- Skip test_macos_wheels since the test script expects cp310-cp312
  wheels but we only built cp314.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

1 participant