e2e(SchedulerSharding): add setup and base e2e tests for Volcano Scheduler Sharding by agrawalcodes · Pull Request #4992 · volcano-sh/volcano

agrawalcodes · 2026-01-25T17:20:39Z

Summary

Add E2E test infrastructure and base tests for Volcano Scheduler with sharding support.

Base setup from parent PR: #4990

This is part of the LFX mentorship project. Additional test cases covering allocate, preempt, reclaim, backfill actions under different sharding modes will be added as part of the mentorship.

Changes

test/e2e/schedulersharding/: Add Volcano Scheduler Sharding E2E test suite (4 test cases)
.github/workflows/e2e_scheduler_sharding.yaml: Add CI workflow

Test Cases

Volcano NodeShard should exist with nodes assigned
Both NodeShards should be accessible
Volcano jobs should be scheduled to cluster nodes
Shard state should remain consistent during job execution

How to Run

make e2e-test-schedulersharding

Test Plan

make e2e-test-schedulershardingr passes locally
CI workflow runs successfully

Related Issues

Ref: Part of #4883

Part of LFX mentorship project: E2E Test Suite for Volcano Agent Scheduling

Signed-off-by: Rajesh Agrawal <rajeshagrawal38000@gmail.com>

…duler Sharding Signed-off-by: Rajesh Agrawal <rajeshagrawal38000@gmail.com>

volcano-sh-bot · 2026-01-25T17:20:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign k82cn for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-01-25T17:20:58Z

Summary of Changes

Hello @agrawalcodes, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational end-to-end testing framework for the Volcano Scheduler's sharding capabilities. It integrates new test suites into the build system, configures the Volcano Helm chart to enable the sharding controller, and provides initial test cases to validate the core functionality of NodeShard management and basic job scheduling in a sharded cluster. This work is a crucial step in ensuring the reliability and correctness of the Volcano scheduler's distributed scheduling features.

Highlights

New E2E Test Suites: Introduced dedicated end-to-end test suites for both the Volcano Scheduler Sharding and the ShardingController, enabling comprehensive testing of sharding functionalities.
Build System Integration: Added new make targets (e2e-test-shardingcontroller and e2e-test-schedulersharding) and updated the E2E runner script (hack/run-e2e-kind.sh) to support the execution of these new test types.
Volcano Helm Chart Configuration: Enhanced the Volcano Helm chart to allow explicit control over enabled controllers via a new controller_controllers parameter, which is crucial for enabling the sharding controller during E2E tests.
Initial Sharding E2E Tests: Implemented base test cases to verify NodeShard creation, accessibility, basic Volcano job scheduling within sharded environments, and the consistency of shard state during job execution.
E2E Utility Functions: Added a new utility file (test/e2e/util/shard.go) containing helper functions for creating, managing, and asserting NodeShard resources within E2E tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/e2e_scheduler_sharding.yaml
- .github/workflows/e2e_sharding_controller.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request does a great job of adding the E2E test infrastructure and initial test cases for Volcano Scheduler Sharding. The addition of helper functions for NodeShard in test/e2e/util/shard.go is a good step towards reusable test code.

My review has identified a few areas for improvement:

There is significant code duplication in hack/run-e2e-kind.sh which should be refactored to improve maintainability.
The helper function waitForNodeShardsCreated is duplicated across two new test suites and should be moved to a shared utility file.
There is extensive use of context.TODO() in the new test files. While this is for tests, it's a good practice to use a more specific context like context.Background() or pass a context down.

Overall, this is a solid contribution that lays a good foundation for more comprehensive sharding tests. Addressing the feedback will enhance the quality and maintainability of the new test code.

gemini-code-assist · 2026-01-25T17:22:56Z

hack/run-e2e-kind.sh

+"SHARDINGCONTROLLER"|"SCHEDULERSHARDING")
+  echo "Install volcano chart with crd version $crd_version and sharding controller enabled"
+  cat <<EOF | helm install ${CLUSTER_NAME} installer/helm/chart/volcano \
+  --namespace ${NAMESPACE} \
+  --kubeconfig ${KUBECONFIG} \
+  --values - \
+  --wait
+basic:
+  image_pull_policy: IfNotPresent
+  image_tag_version: ${TAG}
+  scheduler_config_file: config/volcano-scheduler-ci.conf
+  crd_version: ${crd_version}
+
+custom:
+  scheduler_log_level: 5
+  controller_log_level: 5
+  controller_controllers: "*"
+  admission_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  controller_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  scheduler_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  default_ns:
+    node-role.kubernetes.io/control-plane: ""
+  scheduler_feature_gates: ${FEATURE_GATES}
+  enabled_admissions: "/pods/mutate,/queues/mutate,/podgroups/mutate,/jobs/mutate,/jobs/validate,/jobflows/validate,/pods/validate,/queues/validate,/podgroups/validate,/hypernodes/validate,/cronjobs/validate"
+  ignored_provisioners: ${IGNORED_PROVISIONERS:-""}
+EOF
+  ;;
+"AGENTSCHEDULER")
+  echo "Install volcano chart with crd version $crd_version, sharding controller and agent scheduler enabled"
+  cat <<EOF | helm install ${CLUSTER_NAME} installer/helm/chart/volcano \
+  --namespace ${NAMESPACE} \
+  --kubeconfig ${KUBECONFIG} \
+  --values - \
+  --wait
+basic:
+  image_pull_policy: IfNotPresent
+  image_tag_version: ${TAG}
+  scheduler_config_file: config/volcano-scheduler-ci.conf
+  crd_version: ${crd_version}
+
+custom:
+  scheduler_log_level: 5
+  controller_log_level: 5
+  controller_controllers: "*"
+  agent_scheduler_enable: true
+  admission_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  controller_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  scheduler_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  agent_scheduler_tolerations:
+    - key: "node-role.kubernetes.io/control-plane"
+      operator: "Exists"
+      effect: "NoSchedule"
+    - key: "node-role.kubernetes.io/master"
+      operator: "Exists"
+      effect: "NoSchedule"
+  default_ns:
+    node-role.kubernetes.io/control-plane: ""
+  scheduler_feature_gates: ${FEATURE_GATES}
+  enabled_admissions: "/pods/mutate,/queues/mutate,/podgroups/mutate,/jobs/mutate,/jobs/validate,/jobflows/validate,/pods/validate,/queues/validate,/podgroups/validate,/hypernodes/validate,/cronjobs/validate"
+  ignored_provisioners: ${IGNORED_PROVISIONERS:-""}
+EOF
+  ;;


There's a lot of duplicated code for the helm install command across different E2E_TYPE cases. This makes the script difficult to read and maintain. Consider refactoring this by defining a common set of helm values and overriding specific ones within each case, then having a single helm install call at the end. Alternatively, a bash function could be used to encapsulate the helm install logic.

gemini-code-assist · 2026-01-25T17:22:56Z

test/e2e/schedulersharding/volcano_scheduler_sharding_test.go

+	waitForNodeShardsCreated := func() {
+		By("Waiting for ShardingController to create NodeShards")
+		err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,
+			func(c context.Context) (bool, error) {
+				shards, err := e2eutil.ListNodeShards(ctx)
+				if err != nil {
+					GinkgoWriter.Printf("Error listing NodeShards: %v\n", err)
+					return false, nil
+				}
+				// Check if both expected shards exist
+				foundVolcano := false
+				foundAgent := false
+				for _, shard := range shards.Items {
+					if shard.Name == VolcanoShardName {
+						foundVolcano = true
+					}
+					if shard.Name == AgentSchedulerShardName {
+						foundAgent = true
+					}
+				}
+				if foundVolcano && foundAgent {
+					GinkgoWriter.Printf("Found NodeShards: volcano=%v, agent-scheduler=%v\n", foundVolcano, foundAgent)
+					return true, nil
+				}
+				GinkgoWriter.Printf("Waiting for NodeShards... volcano=%v, agent-scheduler=%v, total=%d\n",
+					foundVolcano, foundAgent, len(shards.Items))
+				return false, nil
+			})
+		Expect(err).NotTo(HaveOccurred(), "NodeShards should be created by ShardingController")
+	}


This helper function waitForNodeShardsCreated is also defined in test/e2e/shardingcontroller/sharding_controller_test.go. To avoid code duplication and improve reusability, consider moving this function to a shared utility package, such as the new test/e2e/util/shard.go.

gemini-code-assist · 2026-01-25T17:22:56Z

test/e2e/schedulersharding/volcano_scheduler_sharding_test.go

+	// waitForNodeShardsCreated waits for NodeShards to be created by the ShardingController
+	waitForNodeShardsCreated := func() {
+		By("Waiting for ShardingController to create NodeShards")
+		err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,


Using context.TODO() is generally discouraged as it's a placeholder for an unknown or undecided context. It's better to use context.Background() for top-level contexts. This applies to other places in this file where context.TODO() is used (e.g., line 163).

Suggested change

err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,

err := wait.PollUntilContextTimeout(context.Background(), pollInterval, shardCreationTimeout, true,

gemini-code-assist · 2026-01-25T17:22:56Z

test/e2e/shardingcontroller/sharding_controller_test.go

+	// waitForNodeShardsCreated waits for NodeShards to be created by the ShardingController
+	waitForNodeShardsCreated := func() {
+		By("Waiting for ShardingController to create NodeShards")
+		err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,
+			func(c context.Context) (bool, error) {
+				shards, err := e2eutil.ListNodeShards(ctx)
+				if err != nil {
+					GinkgoWriter.Printf("Error listing NodeShards: %v\n", err)
+					return false, nil
+				}
+				// Check if both expected shards exist
+				foundVolcano := false
+				foundAgent := false
+				for _, shard := range shards.Items {
+					if shard.Name == VolcanoShardName {
+						foundVolcano = true
+					}
+					if shard.Name == AgentSchedulerShard {
+						foundAgent = true
+					}
+				}
+				if foundVolcano && foundAgent {
+					GinkgoWriter.Printf("Found NodeShards: volcano=%v, agent-scheduler=%v\n", foundVolcano, foundAgent)
+					return true, nil
+				}
+				GinkgoWriter.Printf("Waiting for NodeShards... volcano=%v, agent-scheduler=%v, total=%d\n",
+					foundVolcano, foundAgent, len(shards.Items))
+				return false, nil
+			})
+		Expect(err).NotTo(HaveOccurred(), "NodeShards should be created by ShardingController")
+	}


This helper function waitForNodeShardsCreated is also defined in test/e2e/schedulersharding/volcano_scheduler_sharding_test.go. To avoid code duplication and improve reusability, consider moving this function to a shared utility package, such as the new test/e2e/util/shard.go.

gemini-code-assist · 2026-01-25T17:22:56Z

test/e2e/shardingcontroller/sharding_controller_test.go

+	// waitForNodeShardsCreated waits for NodeShards to be created by the ShardingController
+	waitForNodeShardsCreated := func() {
+		By("Waiting for ShardingController to create NodeShards")
+		err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,


Using context.TODO() is generally discouraged. Please consider using context.Background() instead for better practice, even in test code. This applies to other places in this file where context.TODO() is used (e.g., line 127).

Suggested change

err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,

err := wait.PollUntilContextTimeout(context.Background(), pollInterval, shardCreationTimeout, true,

gemini-code-assist · 2026-01-25T17:22:56Z

test/e2e/util/shard.go

+	nodeShard, err := ctx.Vcclient.ShardV1alpha1().NodeShards().Create(
+		context.TODO(),
+		nodeShard,
+		metav1.CreateOptions{},
+	)


This function uses context.TODO(). It's a best practice to avoid context.TODO() in utility functions. Please consider replacing it with context.Background(). Ideally, these utility functions would accept a context.Context as an argument to give callers control over cancellation and deadlines. This comment applies to all other functions in this file that use context.TODO().

Suggested change

nodeShard, err := ctx.Vcclient.ShardV1alpha1().NodeShards().Create(

context.TODO(),

nodeShard,

metav1.CreateOptions{},

)

nodeShard, err := ctx.Vcclient.ShardV1alpha1().NodeShards().Create(

context.Background(),

nodeShard,

metav1.CreateOptions{},

)

agrawalcodes · 2026-01-25T17:24:58Z

@qi-min @JesseStutler, This one of e2e base suite for Sharding Scheduler extends the same #4990.

JesseStutler · 2026-01-26T08:24:09Z

It's a great contribution, thanks, will take a look after

agrawalcodes · 2026-01-26T16:36:54Z

It's a great contribution, thanks, will take a look after

Cool @JesseStutler. I'll make changes to the base and approach as per your review. This should give us a good foundational base for e2e setup of Agent Scheduling. The gemini review seems to be very generic, I'll resolve them after your review.

The CI ran successfully in GitHub Actions!

agrawalcodes added 3 commits January 25, 2026 22:20

e2e(ShardingController): add setup base e2e tests for ShardingController

35e47b8

Signed-off-by: Rajesh Agrawal <rajeshagrawal38000@gmail.com>

fix(e2e): gemini review

97f84ab

Signed-off-by: Rajesh Agrawal <rajeshagrawal38000@gmail.com>

e2e(SchedulerSharding): add setup and base e2e tests for Volcano Sche…

aea2059

…duler Sharding Signed-off-by: Rajesh Agrawal <rajeshagrawal38000@gmail.com>

volcano-sh-bot requested review from wangyang0616 and william-wang January 25, 2026 17:20

volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 25, 2026

gemini-code-assist bot reviewed Jan 25, 2026

View reviewed changes

agrawalcodes mentioned this pull request Jan 26, 2026

e2e(ShardingController): add setup and base e2e tests for ShardingController #4990

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e(SchedulerSharding): add setup and base e2e tests for Volcano Scheduler Sharding#4992

e2e(SchedulerSharding): add setup and base e2e tests for Volcano Scheduler Sharding#4992
agrawalcodes wants to merge 3 commits intovolcano-sh:masterfrom
agrawalcodes:e2e-schedulersharding

agrawalcodes commented Jan 25, 2026 •

edited

Loading

Uh oh!

volcano-sh-bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

agrawalcodes commented Jan 25, 2026 •

edited

Loading

Uh oh!

JesseStutler commented Jan 26, 2026

Uh oh!

agrawalcodes commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	err := wait.PollUntilContextTimeout(context.TODO(), pollInterval, shardCreationTimeout, true,
	err := wait.PollUntilContextTimeout(context.Background(), pollInterval, shardCreationTimeout, true,

Conversation

agrawalcodes commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Cases

How to Run

Test Plan

Related Issues

Uh oh!

volcano-sh-bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

agrawalcodes commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JesseStutler commented Jan 26, 2026

Uh oh!

agrawalcodes commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agrawalcodes commented Jan 25, 2026 •

edited

Loading

agrawalcodes commented Jan 25, 2026 •

edited

Loading

agrawalcodes commented Jan 26, 2026 •

edited

Loading