Skip to content

fix: prevent nil pointer dereference in application failover controller#7169

Closed
goyalpalak18 wants to merge 1 commit intokarmada-io:masterfrom
goyalpalak18:fix-nil-pointer-tolerationseconds
Closed

fix: prevent nil pointer dereference in application failover controller#7169
goyalpalak18 wants to merge 1 commit intokarmada-io:masterfrom
goyalpalak18:fix-nil-pointer-tolerationseconds

Conversation

@goyalpalak18
Copy link
Contributor

Description

I fixed a critical nil pointer dereference panic in the applicationfailover controller and validation webhook.

The TolerationSeconds field is a pointer (*int32), and while it usually has a default, I found that legacy objects or specific upgrade paths can leave it nil. Previously, the code dereferenced this pointer blindly, causing the karmada-controller-manager to crash-loop.


Root Cause

The detectFailure and syncBinding functions assumed TolerationSeconds was always populated. I identified that direct dereferencing (*tolerationSeconds) without a check was the direct cause of the runtime panic.


Proposed Changes

  1. Defensive Controller Logic:
    I modified rb_application_failover_controller.go and crb_application_failover_controller.go. I added a nil check that falls back to the default 300s if the field is missing, preventing the panic.

  2. Safe Validation:
    I updated pkg/util/validation/validation.go to include a nil-check guard before validating the value.

  3. Tests:
    I added unit tests to rb_application_failover_controller_test.go, crb_application_failover_controller_test.go, and validation_test.go to confirm that nil values are handled gracefully.

@karmada-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign whitewindmills for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 3, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @goyalpalak18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical nil pointer dereference issue within the application failover controllers and validation webhook. The TolerationSeconds field, being a pointer (*int32), could be nil in certain scenarios (e.g., legacy objects or specific upgrade paths), leading to karmada-controller-manager crash-loops. The changes introduce robust nil checks in the controller logic and validation, defaulting to 300 seconds when TolerationSeconds is absent, thereby significantly improving the stability and resilience of the system by preventing runtime panics.

Highlights

  • Defensive Controller Logic: Implemented defensive nil checks for TolerationSeconds in both ClusterResourceBinding and ResourceBinding application failover controllers. If TolerationSeconds is found to be nil, it now defaults to 300 seconds, preventing nil pointer dereferences.
  • Safe Validation: Enhanced the validation webhook (pkg/util/validation/validation.go) to safely handle nil values for TolerationSeconds by adding a nil-check guard before performing validation checks.
  • Comprehensive Unit Tests: Introduced new unit tests in crb_application_failover_controller_test.go, rb_application_failover_controller_test.go, and validation_test.go to ensure the graceful handling of nil TolerationSeconds across controller logic and validation.
Changelog
  • pkg/controllers/applicationfailover/crb_application_failover_controller.go
    • Added a nil check for binding.Spec.Failover.Application.DecisionConditions.TolerationSeconds in the syncBinding function.
    • If TolerationSeconds is nil, it is now explicitly set to a default value of 300 seconds to prevent nil pointer dereference.
  • pkg/controllers/applicationfailover/crb_application_failover_controller_test.go
    • Added a new test case TestCRBApplicationFailoverController_syncBinding_NilTolerationSeconds.
    • This test verifies that syncBinding correctly handles a nil TolerationSeconds without panicking and uses the default value.
  • pkg/controllers/applicationfailover/rb_application_failover_controller.go
    • Implemented a nil check for binding.Spec.Failover.Application.DecisionConditions.TolerationSeconds in the syncBinding function.
    • Assigns a default value of 300 seconds to TolerationSeconds if it is found to be nil.
  • pkg/controllers/applicationfailover/rb_application_failover_controller_test.go
    • Added a new test case TestRBApplicationFailoverController_syncBinding_NilTolerationSeconds.
    • This test confirms syncBinding's robust handling of nil TolerationSeconds with the default value, ensuring no panics.
  • pkg/util/validation/validation.go
    • Modified the ValidateApplicationFailover function to include a nil check for applicationFailoverBehavior.DecisionConditions.TolerationSeconds.
    • The validation logic now only attempts to dereference and validate the value if TolerationSeconds is not nil.
  • pkg/util/validation/validation_test.go
    • Added a new test case to TestValidateApplicationFailover.
    • This test ensures that a nil TolerationSeconds field does not result in a validation error, aligning with the new defensive logic.
Activity
  • The pull request was created by goyalpalak18 to address a critical nil pointer dereference in the application failover controller and validation webhook.
  • The author provided a detailed description of the root cause, explaining that TolerationSeconds being a pointer could be nil in legacy objects or specific upgrade paths, leading to karmada-controller-manager crash-loops.
  • The proposed changes include implementing defensive nil checks in controller logic, enhancing validation, and adding new unit tests to cover these scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical nil pointer dereference in the applicationfailover controller by adding necessary nil checks for TolerationSeconds. The changes are well-implemented across both ClusterResourceBinding and ResourceBinding controllers, and the validation logic. The inclusion of unit tests to cover the nil case is excellent and ensures the fix is robust. My only suggestion is to define the default toleration value as a constant to improve maintainability and avoid magic numbers.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.54%. Comparing base (47c8e1c) to head (0ab4519).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7169      +/-   ##
==========================================
- Coverage   46.56%   46.54%   -0.02%     
==========================================
  Files         700      700              
  Lines       48139    48146       +7     
==========================================
- Hits        22414    22408       -6     
- Misses      24040    24050      +10     
- Partials     1685     1688       +3     
Flag Coverage Δ
unittests 46.54% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RainbowMango
Copy link
Member

Hi @goyalpalak18 is this the same issue as #7127?

@goyalpalak18
Copy link
Contributor Author

Hey @RainbowMango, yes it is.

I opened this one as a simplified replacement for #7127. Based on the feedback there, I dropped the webhook changes and just kept this defensive nil-check for the controller.

I meant to close #7127 earlier—I'll close it now so we can focus on this lighter fix.

@XiShanYongYe-Chang
Copy link
Member

Hi @goyalpalak18, first of all, thanks for your contribution.

As I commented earlier, I think this protection is a bit overdone, because logically speaking, when this function is called, this value will not be nil.

@XiShanYongYe-Chang
Copy link
Member

Hi @goyalpalak18 do you have any other comments or do you agree to close this PR?

@goyalpalak18
Copy link
Contributor Author

No other comments from my side. Let's close it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants