Cleanup unnecessary error return value by hzxuzhonghu · Pull Request #5043 · volcano-sh/volcano

hzxuzhonghu · 2026-02-13T04:01:48Z

This can help reduce the complexity of rollingback in statement operations

triggered by #5010

Signed-off-by: Zhonghu Xu <[email protected]>

volcano-sh-bot · 2026-02-13T04:01:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hzxuzhonghu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-02-13T04:02:10Z

Summary of Changes

Hello @hzxuzhonghu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors error handling within the scheduler's core components, specifically in how tasks and pods are managed within the cache and API. By removing error return values from several key functions, the change aims to simplify the codebase, reduce the need for intricate rollback mechanisms, and streamline the overall control flow. Potential issues that previously resulted in errors are now handled by logging warnings, making the operations more robust and less prone to complex error propagation, which directly addresses the complexity of rolling back statement operations.

Highlights

Simplified Error Handling: Removed error return values from numerous functions across the scheduler cache and API, including AddTask, RemoveTask, UpdateTask, UpdateTaskStatus, DeleteTaskInfo, Evict, Pipeline, Allocate, dispatch, and UnPipeline.
Reduced Complexity: Eliminated complex error-reversion logic, such as reverting task statuses, by making these operations idempotent and logging warnings instead of returning errors.
Streamlined Control Flow: Refactored code paths to no longer check for and propagate errors from internal task/node manipulation functions, leading to cleaner and more direct logic.
Test Updates: Updated corresponding test cases to reflect the changes in function signatures and expected behavior, particularly for scenarios that previously expected errors but now expect successful execution or warning logs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

pkg/agentscheduler/cache/cache.go
- Removed error handling and originalStatus variable from resyncTask and AddBindTask functions.
pkg/agentscheduler/cache/event_handlers.go
- Removed error handling from addTask, deleteTask, deletePod, and DeletePodFromCache functions.
pkg/scheduler/actions/preempt/preempt.go
- Removed error handling from normalPreempt, topologyAwarePreempt, and prepareCandidate functions.
- Removed error handling from SelectVictimsOnNode for RemoveTask and AddTask calls.
pkg/scheduler/actions/reclaim/reclaim.go
- Removed error handling from reclaimForTask function.
pkg/scheduler/api/job_info.go
- Modified UpdateTaskStatus and DeleteTaskInfo functions to no longer return errors.
- Removed validateStatusUpdate function.
pkg/scheduler/api/node_info.go
- Modified AddTask, RemoveTask, and UpdateTask functions to no longer return errors.
- Changed error returns to klog.Warningf and return in AddTask.
pkg/scheduler/api/node_info_test.go
- Updated TestNodeInfo_AddPod to reflect AddTask no longer returning an error.
pkg/scheduler/api/types.go
- Removed validateStatusUpdate function.
pkg/scheduler/cache/cache.go
- Removed error handling and originalStatus variable from Evict and AddBindTask functions.
pkg/scheduler/cache/cache_test.go
- Removed equality import.
- Updated TestSchedulerCache_Bind_NodeWithInsufficientResources to expect success and check task status instead of error.
pkg/scheduler/cache/event_handlers.go
- Removed error handling from addTask and deleteTask functions.
pkg/scheduler/cache/event_handlers_test.go
- Removed fmt import.
- Updated TestSchedulerCache_updateTask and TestSchedulerCache_UpdatePod to expect no errors.
pkg/scheduler/framework/session.go
- Removed error handling from Pipeline, Allocate, dispatch, and Evict functions for job.UpdateTaskStatus and node.AddTask/UpdateTask.
pkg/scheduler/framework/statement.go
- Modified Evict, UnPipeline, Allocate, unallocate, and RecoverOperations functions to no longer return errors from internal job.UpdateTaskStatus and node.UpdateTask/AddTask/RemoveTask calls.
pkg/scheduler/plugins/util/k8s/snapshot_test.go
- Updated TestSnapshot_AddOrUpdateNodes and TestSnapshot_DeleteNode to reflect AddTask no longer returning an error.

Activity

The pull request was authored by hzxuzhonghu.
The changes were initiated to address issue Fix/preemption eviction rollback #5010.
Reviewer @hajnalmt was explicitly mentioned for their attention.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

hzxuzhonghu · 2026-02-13T04:02:17Z

Can review by commit

gemini-code-assist

Code Review

This pull request aims to simplify the codebase by removing unnecessary error return values from various functions, particularly those related to cache operations. While this is a good goal for reducing complexity, the current implementation introduces critical issues in pkg/scheduler/api/node_info.go. Specifically, AddTask and UpdateTask can now fail silently, leading to an inconsistent cache state. For example, a task might be queued for binding even if it couldn't be added to a node's resource accounting, or an update operation could leave a task removed but not re-added. My review includes detailed comments on these critical issues with suggestions on how to address them by ensuring that callers are aware of operation failures.

gemini-code-assist · 2026-02-13T04:04:37Z

pkg/scheduler/api/node_info.go

This function was changed to no longer return an error, but it has early return statements that can cause silent failures. For example:

If the task already exists on the node (lines 436-439).

If the node has insufficient resources for a Binding task (lines 457-459).

In these cases, AddTask returns without adding the task, but callers like AddBindTask in pkg/scheduler/cache/cache.go and pkg/agentscheduler/cache/cache.go are unaware of the failure. They proceed as if the task was successfully added, leading to an inconsistent cache state and incorrect scheduling behavior (e.g., a task is sent for binding but not accounted for on the node).

To fix this, AddTask should signal success or failure to its callers, for example by returning a boolean. Callers must then handle the failure case, for instance by rolling back state changes or not proceeding with further actions like binding.

gemini-code-assist · 2026-02-13T04:04:37Z

pkg/scheduler/api/node_info.go

This simplification of UpdateTask is problematic because AddTask can now fail silently (as noted in another comment). If AddTask fails after RemoveTask has succeeded, the task will be removed but not added back, leaving NodeInfo in an inconsistent state. This can break callers like Evict in pkg/scheduler/cache/cache.go, which rely on UpdateTask to correctly transition a task's state on the node.

UpdateTask should be atomic. If AddTask can fail, UpdateTask needs to handle that failure, perhaps by checking if the addition is possible before removing the old task, or by rolling back the removal if the addition fails.

Copilot

Pull request overview

This pull request removes error return values from several core scheduler operations, including AddTask, RemoveTask, UpdateTask, UpdateTaskStatus, and DeleteTaskInfo. The stated goal is to simplify rollback complexity in statement operations following PR #5010, which fixed preemption eviction rollback issues.

Changes:

Modified core API methods (NodeInfo, JobInfo) to use warning logs instead of returning errors
Updated Statement and Session operations to call these methods without error handling
Removed the validateStatusUpdate function
Updated cache operations to ignore errors from underlying API calls
Modified tests to expect success instead of errors in failure scenarios

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
pkg/scheduler/api/node_info.go	Changed AddTask, RemoveTask, UpdateTask from error-returning to void functions with warning logs
pkg/scheduler/api/job_info.go	Changed UpdateTaskStatus and DeleteTaskInfo to void functions
pkg/scheduler/api/types.go	Removed validateStatusUpdate function
pkg/scheduler/framework/statement.go	Updated Evict, Pipeline, Allocate to call updated API methods without error handling
pkg/scheduler/framework/session.go	Updated Pipeline, Allocate, Evict to call updated API methods without error handling
pkg/scheduler/cache/cache.go	Removed error handling and rollback logic from Evict and AddBindTask
pkg/scheduler/cache/event_handlers.go	Removed error handling from addTask and deleteTask
pkg/scheduler/cache/event_handlers_test.go	Changed test expectations from errors to success
pkg/scheduler/cache/cache_test.go	Changed test to expect bind success even with insufficient resources
pkg/agentscheduler/cache/cache.go	Removed error handling from AddBindTask
pkg/agentscheduler/cache/event_handlers.go	Changed deleteTask and deletePod to void functions
pkg/scheduler/actions/reclaim/reclaim.go	Removed error handling from Evict call
pkg/scheduler/actions/preempt/preempt.go	Removed error handling from Evict and prepareCandidate
pkg/scheduler/plugins/util/k8s/snapshot_test.go	Removed error handling from AddTask in tests
pkg/scheduler/api/node_info_test.go	Removed error checking from AddTask test

Comments suppressed due to low confidence (6)

pkg/scheduler/actions/preempt/preempt.go:490

Critical issue: When stmt.Pipeline fails (line 481), the code attempts to rollback with UnPipeline (line 484), but it does NOT return an error or discard the statement. According to PR #5010, which this PR is related to, when Pipeline fails after victims have been evicted (via prepareCandidate on line 479), the entire statement should be discarded to prevent victim evictions from being committed.

Currently, the function returns (true, nil) on line 490 regardless of whether Pipeline succeeded or failed. This means:

Victims are evicted via prepareCandidate (line 479)
If Pipeline fails, UnPipeline attempts to rollback the preemptor
But the statement containing the victim evictions is NOT discarded
When the statement is later committed, victims are evicted even though preemption failed

This breaks the atomicity assumption described in PR #5010 and can cause victim tasks to be evicted without the preemptor being scheduled.

	prepareCandidate(bestCandidate, preemptor.Pod, stmt)

	if err := stmt.Pipeline(preemptor, bestCandidate.Name(), true); err != nil {
		klog.Errorf("Failed to pipeline Task <%s/%s> on Node <%s>",
			preemptor.Namespace, preemptor.Name, bestCandidate.Name())
		if rollbackErr := stmt.UnPipeline(preemptor); rollbackErr != nil {
			klog.Errorf("Failed to unpipeline Task %v on %v in Session %v for %v.",
				preemptor.UID, bestCandidate.Name(), ssn.UID, rollbackErr)
		}
	}

	return true, nil

pkg/scheduler/cache/cache_test.go:212

This test's expectations have been changed but may not accurately reflect the actual system behavior after the refactoring. The test expects:

Bind to succeed even with oversubscription (line 198)
Task status to be Binding (line 205-206)
Node to remain unchanged (line 210-212)

However, when AddTask is called on a node with insufficient resources for a Binding task, it logs a warning and returns early without adding the task (see node_info.go lines 457-459). This creates an inconsistent state where the task status is Binding in the job, but the task is not actually added to the node. This inconsistency is not checked by the test.

The test should verify that either:
a) The bind operation properly detects and handles the inconsistency (e.g., by calling resyncTask), or
b) The bind is rejected entirely to maintain consistency

The current test expectations may mask a real bug where partial state updates can occur.

	if err != nil {
		t.Errorf("expected bind to succeed even when node is oversubscribed: %v", err)
	}

	_, taskAfterBind, err := cache.findJobAndTask(task)
	if err != nil {
		t.Errorf("expected to find task after failed bind")
	}
	if taskAfterBind.Status != api.Binding {
		t.Errorf("expected task status to be Binding after bind attempt, got %v", taskAfterBind.Status)
	}

	nodeAfterBind := cache.Nodes["n1"].Clone()
	if !reflect.DeepEqual(nodeBeforeBind, nodeAfterBind) {
		t.Errorf("expected node to remain the same after failed bind")
	}

pkg/scheduler/framework/statement.go:164

When Pipeline is called and the job is found, UpdateTaskStatus is called unconditionally (line 144), but if the node is not found (line 155-164), the task status has already been updated to Pipelined even though the node operations cannot be performed.

Similarly, when the node is found, node.AddTask is called (line 156) which may silently fail and return early without actually adding the task (e.g., if the task already exists on the node). In this case:

The job thinks the task is Pipelined
The task.NodeName is set to hostname (line 152)
But the task may not actually be in the node's task list if AddTask failed silently

This creates an inconsistent state where the session's view doesn't match reality. The error is collected in errInfos and returned, but the partial state changes (job status update) are not reverted.

	job, found := s.ssn.Jobs[task.Job]
	if found {
		job.UpdateTaskStatus(task, api.Pipelined)
	} else {
		err := fmt.Errorf("Failed to find Job <%s> in Session <%s> index when pipeline.",
			task.Job, s.ssn.UID)
		klog.Errorf("%v", err)
		errInfos = append(errInfos, err)
	}

	task.NodeName = hostname
	task.EvictionOccurred = evictionOccurred

	if node, found := s.ssn.Nodes[hostname]; found {
		node.AddTask(task)
		klog.V(3).Infof("After pipelined Task <%v/%v> to Node <%v>: idle <%v>, used <%v>, releasing <%v>",
			task.Namespace, task.Name, node.Name, node.Idle, node.Used, node.Releasing)
	} else {
		err := fmt.Errorf("Failed to find Node <%s> in Session <%s> index when pipeline.",
			hostname, s.ssn.UID)
		klog.Errorf("%v", err)
		errInfos = append(errInfos, err)
	}

pkg/scheduler/framework/session.go:811

Same issue as in Session.Pipeline: when the job is found, UpdateTaskStatus is called unconditionally (line 794) and both task.Pod.Spec.NodeName (line 789) and task.NodeName (line 801) are set. If the node is not found (lines 803-811), an error is returned AFTER these state changes. If node.AddTask silently fails (line 804), no error is returned.

This creates inconsistent state where the job and task think allocation succeeded, but the node doesn't actually have the task. The problem is compounded by the dispatch logic (lines 822-830) which may be called based on incorrect state assumptions.

	job, found := ssn.Jobs[task.Job]
	if found {
		job.UpdateTaskStatus(task, api.Allocated)
	} else {
		klog.Errorf("Failed to find Job <%s> in Session <%s> index when binding.",
			task.Job, ssn.UID)
		return fmt.Errorf("failed to find job %s", task.Job)
	}

	task.NodeName = hostname

	if node, found := ssn.Nodes[hostname]; found {
		node.AddTask(task)
		klog.V(3).Infof("After allocated Task <%v/%v> to Node <%v>: idle <%v>, used <%v>, releasing <%v>",
			task.Namespace, task.Name, node.Name, node.Idle, node.Used, node.Releasing)
	} else {
		klog.Errorf("Failed to find Node <%s> in Session <%s> index when binding.",
			hostname, ssn.UID)
		return fmt.Errorf("failed to find node %s", hostname)
	}

pkg/scheduler/framework/statement.go:258

Same issue as in Pipeline: when Allocate is called and the job is found, UpdateTaskStatus is called unconditionally (line 240), but if the node is not found (lines 249-258) or node.AddTask silently fails (line 250), the task status has already been updated to Allocated even though the allocation didn't fully succeed.

Specifically:

The job thinks the task is Allocated
task.Pod.Spec.NodeName is set (line 235)
task.NodeName is set (line 248)
But if node.AddTask fails silently (e.g., task already exists or insufficient resources for Binding status), the node doesn't actually have the task

This creates an inconsistent state. While errors are collected and returned, the partial state changes are not reverted, violating atomicity.

	job, found := s.ssn.Jobs[task.Job]
	if found {
		job.UpdateTaskStatus(task, api.Allocated)
	} else {
		err := fmt.Errorf("Failed to find Job <%s> in Session <%s> index when allocating.",
			task.Job, s.ssn.UID)
		klog.Errorf("%v", err)
		errInfos = append(errInfos, err)
	}

	task.NodeName = hostname
	if node, found := s.ssn.Nodes[hostname]; found {
		node.AddTask(task)
		klog.V(3).Infof("After allocated Task <%v/%v> to Node <%v>: idle <%v>, used <%v>, releasing <%v>",
			task.Namespace, task.Name, node.Name, node.Idle, node.Used, node.Releasing)
	} else {
		err := fmt.Errorf("Failed to find Node <%s> in Session <%s> index when allocating.",
			hostname, s.ssn.UID)
		klog.Errorf("%v", err)
		errInfos = append(errInfos, err)
	}

pkg/scheduler/framework/session.go:773

In Session.Pipeline, when the job is found, UpdateTaskStatus is called unconditionally (line 756) and task.NodeName is set (line 763), but if the node is not found (lines 765-773), an error is returned AFTER these state changes have already been made. Similarly, if node.AddTask silently fails (line 766), no error is returned at all.

This creates inconsistent state:

If the node is not found: job status is Pipelined, task.NodeName is set, but an error is returned and the node doesn't have the task
If node.AddTask fails silently: job status is Pipelined, task.NodeName is set, the function returns success (nil), but the node doesn't actually have the task

The inconsistency is more severe here than in Statement.Pipeline because there's no mechanism to rollback these changes when errors occur.

	job, found := ssn.Jobs[task.Job]
	if found {
		job.UpdateTaskStatus(task, api.Pipelined)
	} else {
		klog.Errorf("Failed to find Job <%s> in Session <%s> index when pipeline.",
			task.Job, ssn.UID)
		return fmt.Errorf("failed to find job %s when pipeline", task.Job)
	}

	task.NodeName = hostname

	if node, found := ssn.Nodes[hostname]; found {
		node.AddTask(task)
		klog.V(3).Infof("After pipelined Task <%v/%v> to Node <%v>: idle <%v>, used <%v>, releasing <%v>",
			task.Namespace, task.Name, node.Name, node.Idle, node.Used, node.Releasing)
	} else {
		klog.Errorf("Failed to find Node <%s> in Session <%s> index when pipeline.",
			hostname, ssn.UID)
		return fmt.Errorf("failed to find node %s", hostname)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/agentscheduler/cache/cache.go

pkg/scheduler/cache/event_handlers_test.go

pkg/scheduler/framework/statement.go

pkg/scheduler/cache/cache.go

pkg/scheduler/cache/event_handlers.go

pkg/scheduler/cache/event_handlers_test.go

pkg/scheduler/api/node_info_test.go

pkg/scheduler/api/node_info.go

Aman-Cool · 2026-02-13T06:26:38Z

Hey @hzxuzhonghu, thanks for being open to reverting that! I appreciate you taking another look.

I'm seeing similar concerns in a few other places in this PR where removing error returns might cause issues. The main thing I'm worried about is how this affects the atomicity fix in #5010 ; if operations like Evict() and Pipeline() can't signal failures, we can't properly rollback when preemption attempts fail.

Would you be open to discussing this here in the PR? Maybe we can find a way to simplify error handling without losing the ability to detect failures and maintain consistency?

The specific concern is: when preemption tries to evict victims and then pipeline the preemptor, if pipeline fails we need to know it failed so we can discard those evictions. Without error returns, we have no way to detect the failure and make that decision.

Happy to explain more about what I'm seeing if it helps!

Signed-off-by: Zhonghu Xu <[email protected]>

hzxuzhonghu · 2026-02-13T06:33:51Z

pkg/scheduler/api/node_info.go

 	if err := ni.AddTask(ti); err != nil {
 		// This should never happen if task removal was successful,
 		// because only possible error during task addition is when task is still on a node.
 		klog.Fatalf("Failed to add Task <%s,%s> to Node <%s> during task update",


I think we need to revist this place, fatal should not be used here

We should just return err here instead of klog.FatalF(.

Yeah, but this will add a error return value, which need further doing undo error handling in many place.

As the comment say, this could not happen at all.

Signed-off-by: Zhonghu Xu <[email protected]>

hzxuzhonghu · 2026-02-13T07:45:05Z

I'm seeing similar concerns in a few other places in this PR where removing error returns might cause issues. The main thing I'm worried about is how this affects the atomicity fix in #5010 ; if operations like Evict() and Pipeline() can't signal failures, we can't properly rollback when preemption attempts fail.

Please take a closer look, all what i have done here is redudant error (never can return)

Signed-off-by: Zhonghu Xu <[email protected]>

Aman-Cool · 2026-02-13T08:35:46Z

Removing these error returns breaks what #5010 is fixing.

The fix depends on detecting pipeline failures:

err := tempStmt.Pipeline(preemptor, node)
if err != nil {
    tempStmt.Discard()  // throw away evictions
} else {
    stmt.Merge(tempStmt)  // commit evictions
}

Without the error signal, we can't tell if pipeline succeeded or failed. We'd be guessing whether to commit evictions. When we guess wrong, victims get evicted even though the preemptor never scheduled, that's the bug.

I know you're trying to simplify things, but we need that error signal for the transactional pattern to work.

hajnalmt

I want to add my comment, which I elaborated on here:

#5010 (review)

I feel that this change is really risky and instead of deleting the error handling we should move into the direction where we return more errors and handle errors everywhere properly.

hajnalmt · 2026-02-13T09:11:29Z

pkg/scheduler/api/job_info.go

 // UpdateTaskStatus is used to update task's status in a job.
-// If error occurs both task and job are guaranteed to be in the original state.
-func (ji *JobInfo) UpdateTaskStatus(task *TaskInfo, status TaskStatus) error {
-	if err := validateStatusUpdate(task.Status, status); err != nil {


validateStatusUpdate is currently empty, but this is a needed functionality I think, we just didn't implement it yet as nobody was willing to implement the state restrictions for this.

not sure, i am always prefer cleaning up unused stuff to keep the code more readable

hajnalmt · 2026-02-13T09:12:45Z

pkg/scheduler/api/node_info.go

 	if err := ni.AddTask(ti); err != nil {
 		// This should never happen if task removal was successful,
 		// because only possible error during task addition is when task is still on a node.
 		klog.Fatalf("Failed to add Task <%s,%s> to Node <%s> during task update",


We should just return err here instead of klog.FatalF(.

hzxuzhonghu · 2026-02-14T01:24:04Z

I feel that this change is really risky and instead of deleting the error handling we should move into the direction where we return more errors and handle errors everywhere properly.

I donot actually change any behavior. And this does not conflict with the direction that we should fix.

It just remove unused return value and keep the error handling simpler.

hzxuzhonghu · 2026-02-14T01:47:44Z

Without the error signal, we can't tell if pipeline succeeded or failed. We'd be guessing whether to commit evictions. When we guess wrong, victims get evicted even though the preemptor never scheduled, that's the bug.

@Aman-Cool Please donot comment without looking, node_info.go and jon_info.go are the main changes, which remove the never returned error. How could it influence commit or discard.

hzxuzhonghu

The cleanup is very simple only need to take a look at job_info.go and node_info.go

hzxuzhonghu · 2026-02-14T02:00:38Z

pkg/scheduler/api/node_info.go

 	if !found {
 		klog.Warningf("failed to find task <%v/%v> on host <%v>",
 			ti.Namespace, ti.Name, ni.Name)
-		return nil


I should comment for reviewers, this is not kind of error, just a warning, like that the task is not belonging to this node at all.

hzxuzhonghu · 2026-02-14T02:01:36Z

pkg/scheduler/api/job_info.go

 // UpdateTaskStatus is used to update task's status in a job.
-// If error occurs both task and job are guaranteed to be in the original state.
-func (ji *JobInfo) UpdateTaskStatus(task *TaskInfo, status TaskStatus) error {
-	if err := validateStatusUpdate(task.Status, status); err != nil {


Legacy stuff

hzxuzhonghu · 2026-02-14T02:02:10Z

pkg/scheduler/api/job_info.go

 }

 // DeleteTaskInfo is used to delete a task from a job
-func (ji *JobInfo) DeleteTaskInfo(ti *TaskInfo) error {


There is never a error at all

Signed-off-by: Zhonghu Xu <[email protected]>

Aman-Cool · 2026-02-14T12:37:50Z

@hzxuzhonghu I did look at node_info.go and job_info.go. The concern isn't whether these specific functions return errors today, it's that removing error returns from operations like Evict() and Pipeline() (which you're also changing) breaks the ability to make commit/discard decisions in #5010.

The Merge/Discard pattern needs error signals to know whether operations succeeded or failed. Without them, we can't maintain atomicity in the preemption flow.

hzxuzhonghu added 2 commits February 13, 2026 10:20

update

720366b

Signed-off-by: Zhonghu Xu <[email protected]>

remove UpdateTaskStatus unused error return value

5c24d0b

Signed-off-by: Zhonghu Xu <[email protected]>

Copilot AI review requested due to automatic review settings February 13, 2026 04:01

volcano-sh-bot requested review from shinytang6, wangyang0616 and wpeng102 February 13, 2026 04:01

volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 13, 2026

Copilot started reviewing on behalf of hzxuzhonghu February 13, 2026 04:02 View session

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

Copilot AI reviewed Feb 13, 2026

View reviewed changes

hzxuzhonghu commented Feb 13, 2026

View reviewed changes

pkg/scheduler/api/node_info.go Show resolved Hide resolved

hzxuzhonghu added 3 commits February 13, 2026 14:29

remove UpdateTask unused error return value

6274b6e

Signed-off-by: Zhonghu Xu <[email protected]>

remove Evict unused error return value

fac699e

Signed-off-by: Zhonghu Xu <[email protected]>

remove prepareCandidate return value

797b065

Signed-off-by: Zhonghu Xu <[email protected]>

hzxuzhonghu force-pushed the cleanup branch from 0231580 to 797b065 Compare February 13, 2026 06:30

hzxuzhonghu commented Feb 13, 2026

View reviewed changes

revert test

833460b

Signed-off-by: Zhonghu Xu <[email protected]>

Aman-Cool mentioned this pull request Feb 13, 2026

Fix/preemption eviction rollback #5010

Open

fix lint

05c8fcc

Signed-off-by: Zhonghu Xu <[email protected]>

hajnalmt reviewed Feb 13, 2026

View reviewed changes

hzxuzhonghu commented Feb 14, 2026

View reviewed changes

revert test

7eea109

Signed-off-by: Zhonghu Xu <[email protected]>

Conversation

hzxuzhonghu commented Feb 13, 2026

Uh oh!

volcano-sh-bot commented Feb 13, 2026

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

hzxuzhonghu commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Aman-Cool commented Feb 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hzxuzhonghu commented Feb 13, 2026

Uh oh!

Aman-Cool commented Feb 13, 2026

Uh oh!

hajnalmt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hzxuzhonghu commented Feb 14, 2026

Uh oh!

hzxuzhonghu commented Feb 14, 2026

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aman-Cool commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants