feat: add opt-in scheduling gates for queue capacity control#5033
feat: add opt-in scheduling gates for queue capacity control#5033devzizu wants to merge 22 commits intovolcano-sh:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @devzizu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where Volcano's scheduler could cause unnecessary cluster autoscaler scale-ups by marking pods as 'Unschedulable' even when they were only waiting for queue capacity. By leveraging Kubernetes scheduling gates, it introduces an opt-in mechanism to delay the 'Unschedulable' condition until a pod is truly blocked by a lack of cluster resources, while also ensuring proper queue capacity accounting for pods that have passed initial admission but are awaiting node allocation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
1f0980d to
7d15238
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-designed feature to add opt-in scheduling gates for queue capacity control, improving compatibility with cluster autoscalers. While the overall design is comprehensive and includes a valuable E2E test, two significant security-related issues were identified: a data race in the asynchronous gate removal workers that could lead to a scheduler panic (Denial of Service), and a race condition in the JSON Patch logic for removing scheduling gates that could result in the unauthorized removal of security-related gates (Broken Access Control). Both of these critical security issues must be addressed before merging. Additionally, there are minor suggestions for a grammatical correction in the design document and replacing context.TODO() with context.Background() in background operations for improved context management.
4077068 to
6e3165d
Compare
0f96f9a to
75d95dc
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements opt-in scheduling gates for queue capacity control to prevent cluster autoscalers from triggering unnecessary scale-ups when pods are waiting for queue admission rather than cluster capacity. The solution addresses issue #4710 by using Kubernetes scheduling gates to delay the Unschedulable condition until pods pass queue capacity checks.
Changes:
- Added opt-in annotation
scheduling.volcano.sh/queue-allocation-gate: "true"for pods to participate in gate-based queue admission control - Implemented asynchronous gate removal workers in the allocate action to remove gates when pods pass capacity checks
- Enhanced capacity plugin with reserved task tracking to prevent race conditions where ungated-but-unscheduled pods could allow queue over-allocation
- Added comprehensive E2E tests and design documentation
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/src/volcano.sh/apis/pkg/apis/scheduling/v1beta1/labels.go | Added QueueAllocationGateKey constant for the opt-in annotation |
| pkg/scheduler/api/helpers.go | Added helper functions to check for Volcano scheduling gates and annotations |
| pkg/webhooks/admission/pods/mutate/mutate_pod.go | Added webhook logic to inject scheduling gates for opted-in pods |
| pkg/scheduler/actions/allocate/allocate.go | Implemented async worker infrastructure and gate removal logic in allocate action |
| pkg/scheduler/plugins/capacity/capacity.go | Added reserved task cache to account for ungated pods in capacity checks |
| pkg/scheduler/framework/statement.go | Added Name() and Task() methods to operation struct for cleanup callbacks |
| pkg/scheduler/framework/session.go | Added cleanupReservationsFns map to session |
| pkg/scheduler/framework/session_plugins.go | Added AddCleanupReservationsFn and CleanupReservations methods |
| pkg/scheduler/api/types.go | Added CleanupReservationsFn function type |
| pkg/scheduler/api/job_info.go | Modified GetSchGatedPodResources to exclude Volcano-only gated pods |
| pkg/scheduler/cache/util.go | Added RemoveVolcanoSchGate utility function for gate removal |
| pkg/scheduler/cache/cache.go | Added synchronous gate removal before bind as safety guarantee |
| test/e2e/schedulingaction/allocate.go | Added comprehensive E2E test for capacity reservation behavior |
| test/e2e/util/pod.go | Added helper functions for checking gate status in tests |
| test/e2e/util/job.go | Added Annotations field to TaskSpec for test configuration |
| pkg/scheduler/util/test_utils.go | Updated test utilities to handle SubGroupPolicy without NetworkTopology |
| pkg/scheduler/actions/allocate/allocate_test.go | Added unit test for SubGroupPolicy without NetworkTopology |
| docs/design/scheduling-gates-queue-admission.md | Added comprehensive design document explaining the feature |
| docs/user-guide/version-compatibility-archive.md | Added version compatibility archive document |
| README.md | Updated Kubernetes compatibility table |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d0bbf95 to
e50491c
Compare
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| }) | ||
|
|
||
| ginkgo.It("Unschedulable pod with removed gate reserves queue capacity and blocks other pods", func() { |
There was a problem hiding this comment.
This is a really good test! it's so cool that it covers the webhook injection, the gate removal, the capacity reservation and reservation cleanup even in one go.
What we should add maybe as a test case too is that we don't remove other scheduling gates accidentally, like a pod with multiple scheduling gates one is the volcano one with annotation, it shouldn't schedule.
There was a problem hiding this comment.
Added a new test case, can you please see if it fits the purpose? One part I'm not entirely happy with is the check that both gates remain and the pod stays Pending. Right now I'm using gomega.Consistently over 10 seconds (polling every second) to assert that the pod still has both gates and is Pending. It works, but it feels a bit arbitrary (fixed window and interval) and doesn't give a clear, deterministic guarantee that the scheduler has run and decided to skip the task.
hajnalmt
left a comment
There was a problem hiding this comment.
/ok-to-test
This is quite well written. I don't see a reason not to test it. Let's see.
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
…nsistency Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
commit f4db07d Merge: 1000067 01823d1 Author: Volcano Bot <[email protected]> Date: Fri Feb 13 11:26:57 2026 +0800 Merge pull request volcano-sh#5039 from volcano-sh/copilot/update-kubernetes-compatibility-map Reorganize Kubernetes compatibility matrix: show 5 latest versions with newest first commit 01823d1 Author: Jesse Stutler <[email protected]> Date: Fri Feb 13 10:20:07 2026 +0800 Add version compatibility archive and update README with reference link commit 1000067 Merge: 6ba0e29 56c6901 Author: Volcano Bot <[email protected]> Date: Thu Feb 12 16:49:57 2026 +0800 Merge pull request volcano-sh#5038 from JesseStutler/fix_4871 Fixed issue where jobs with subgroups but not hard networkTopology.mode could not be scheduled. commit f22bea4 Author: copilot-swe-agent[bot] <[email protected]> Date: Thu Feb 12 08:42:26 2026 +0000 Reorganize Kubernetes compatibility table: show only 5 latest versions with newest on top and left Co-authored-by: JesseStutler <[email protected]> commit 56c6901 Author: JesseStutler <[email protected]> Date: Thu Feb 12 16:05:35 2026 +0800 Fixed issue where jobs with subgroups but not hard networkTopology.mode could not be scheduled. Signed-off-by: zhengchenyu <[email protected]> Signed-off-by: JesseStutler <[email protected]> commit 9125f3e Author: copilot-swe-agent[bot] <[email protected]> Date: Thu Feb 12 06:15:55 2026 +0000 Add Kubernetes compatibility for Volcano v1.13 and v1.14 Co-authored-by: JesseStutler <[email protected]> commit f4e0f66 Author: copilot-swe-agent[bot] <[email protected]> Date: Thu Feb 12 06:14:39 2026 +0000 Initial plan Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
… values Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
Signed-off-by: devzizu <[email protected]>
b2c7960 to
c23023f
Compare
Signed-off-by: devzizu <[email protected]>
4c3064c to
0e40cee
Compare
Signed-off-by: devzizu <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds opt-in scheduling gates for queue capacity control to prevent cluster autoscalers from triggering unnecessary scale-ups when pods are waiting for queue admission rather than cluster capacity.
When a pod opts in via the
scheduling.volcano.sh/queue-allocation-gate: "true"annotation, Volcano adds a scheduling gate that delays theUnschedulablecondition until the queue has capacity.The design is further described in #4727.
Which issue(s) this PR fixes:
Fixes #4710
Special notes for your reviewer:
Does this PR introduce a user-facing change?