Elliot barn/build cp314 macos wheel#62622
Elliot barn/build cp314 macos wheel#62622elliot-barn wants to merge 3 commits intoreleases/2.55.0from
Conversation
The 2.55.0 release branch was missing Python 3.14 in the macOS wheel build script. This temp branch builds only the cp314 wheel so it can be copied to the release S3 path. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
TRAVIS_COMMIT will be set to the release SHA (58af3fc) so the wheel has the correct ray.__commit__, but BUILDKITE_COMMIT will be this temp branch's SHA which would fail the validation check. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental centralized capacity queue request router for Ray Serve, featuring a new CapacityQueue actor and a CapacityQueueRouter. It also refactors Serve's autoscaling delay logic to utilize wall-clock timestamps for improved consistency and implements exponential backoff for ACTOR_UNAVAILABLE task errors. Additionally, the Ray version is updated to 2.55.0 across the project. Feedback on the experimental router implementation identifies several critical bugs, including missing methods and incorrect attribute access that would lead to runtime errors, as well as performance concerns regarding actor discovery and queue management.
I am having trouble creating individual review comments. Click here to see my feedback.
python/ray/serve/experimental/capacity_queue_router.py (62)
To perform RunningReplica objects in _choose_replica_for_request, you should maintain a mapping from ReplicaID to RunningReplica in addition to the UID index.
self._uid_to_replica_id: Dict[str, ReplicaID] = {}
self._replica_map: Dict[ReplicaID, RunningReplica] = {}
python/ray/serve/experimental/capacity_queue_router.py (116)
The base class super().update_replicas(replicas) stores a list of RunningReplica objects in self._replicas. RunningReplica does not have a unique_id attribute directly; it is nested under replica_id. Also, you should populate the _replica_map suggested earlier.
self._uid_to_replica_id = {r.replica_id.unique_id: r.replica_id for r in self._replicas}
self._replica_map = {r.replica_id: r for r in self._replicas}
python/ray/serve/experimental/capacity_queue_router.py (169)
The _backoff method is called here and in other places but is not defined in this class or its base classes. This will result in an AttributeError at runtime. You should implement this method, likely using asyncio.sleep and the backoff parameters from self._config.
python/ray/serve/experimental/capacity_queue_router.py (212)
In the base RequestRouter class, self._replicas is a list, which does not have a .get() method. This will raise an AttributeError. Use the _replica_map suggested in update_replicas for
replica = self._replica_map.get(rid) if rid is not None else None
python/ray/serve/experimental/capacity_queue.py (247)
Reconstructing the entire _waiters deque on every timeout or cancellation is an _fulfill_waiters already skips completed futures at the head of the queue, you can rely on it for lazy cleanup. If you need to keep the deque size in check, consider a more efficient periodic cleanup or only filtering when the number of completed futures exceeds a threshold.
python/ray/serve/experimental/capacity_queue_router.py (100)
Using ray.util.list_named_actors(all_namespaces=True) is an expensive operation that should be avoided in the request path. Since Serve deployment actor names are deterministic, you can construct the full name using self._deployment_id and self._capacity_queue_actor_name and use ray.get_actor directly. This would be much more efficient and reliable.
- Export MACOSX_DEPLOYMENT_TARGET=12.0 so the wheel is tagged macosx_12_0_arm64 to match the other release wheels. - Skip test_macos_wheels since the test script expects cp310-cp312 wheels but we only built cp314. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Description
Related issues
Additional information