performance(detector): reduce lock contention in waitingObkects matching logic by zhzhuang-zju · Pull Request #7175 · karmada-io/karmada

zhzhuang-zju · 2026-02-06T03:12:14Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
waitingObjects is designed to track resource templates that haven't been propagated due to a lack of a matching (Cluster)PropagationPolicy. The current implementation has the following issues:

Coarse-grained Locking: Any read/write operation locks the entire waitingObjects, leading to severe lock contention in large-scale clusters.
Inefficient Matching: Each policy match requires a full traversal of all waiting objects, resulting in $O(N)$ complexity and high CPU usage.

This PR introduces the following optimizations:

Two-level Indexing: Implements a GVK -> Namespace -> Object structure. This significantly reduces invalid traversals during matching.
Fine-grained Locking: Uses separate locks for different GVKs to minimize cross-resource lock contention.
Optimized Lookup: Employs sync.Map to index from GVK to namespacedObjects. Since GVKs are relatively stable, this achieves near lock-free access for GVK lookups.

Which issue(s) this PR fixes:

Part of #7139

Special notes for your reviewer:

Test Environment:
The environment contains approximately 60k-70k workloads distributed across different namespaces, with 7k-8k per single namespace. Workloads are dispatched to member clusters via policies.
In addition, there are thousands of dependent resources such as PVCs and 20k-30k ConfigMaps. These do not have separately bound policies but are dispatched to member clusters along with the workloads through the dependent propagation feature.
Now, restart karmada-controller-manager.

For the analysis of this figure, you can refer to: #7139 (comment)

After optimization:

As seen from the images:

The reconciliation of existing resources was reduced from 11 minutes to 5 minutes;
The duration of a single matching operation dropped from 100ms to 0.9ms.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Enhanced the concurrency and scalability of the `Detector` by refactoring `waitingObjects` to minimize lock contention during policy matching against waiting resources, leading to more responsive processing in high-load scenarios.

Signed-off-by: zhzhuang-zju <m17799853869@163.com>

karmada-bot · 2026-02-06T03:12:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chaunceyjiang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-02-06T03:12:31Z

Summary of Changes

Hello @zhzhuang-zju, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a performance optimization within the ResourceDetector component. The primary goal is to reduce lock contention in the logic that matches waitingObjects with propagation policies. By transitioning from a single global lock to a more distributed and granular locking mechanism using sync.Map and a new namespacedObject structure, the changes aim to improve the detector's efficiency and scalability, especially under high load conditions where many objects are awaiting policy propagation.

Highlights

Refactored Waiting Objects Data Structure: The waitingObjects field in ResourceDetector has been changed from a map[keys.ClusterWideKey]struct{} protected by a single sync.RWMutex to a sync.Map.
Introduced Hierarchical Locking: A new namespacedObject struct was introduced, which itself contains a sync.RWMutex and a nested map structure (map[string]map[keys.ClusterWideKey]struct{}). This allows for more granular locking, reducing contention by locking only specific GVKs and namespaces rather than the entire waitingObjects map.
Optimized Matching Logic: The isWaiting, AddWaiting, RemoveWaiting, and GetMatching methods have been updated to utilize the new sync.Map and namespacedObject structure, improving concurrency and performance.
Changed Return Type for GetMatching: The GetMatching function now returns a sets.Set[keys.ClusterWideKey] instead of a []keys.ClusterWideKey, which can offer better performance for set operations and uniqueness guarantees.

Changelog

pkg/detector/detector.go
- Added import for k8s.io/apimachinery/pkg/util/sets.
- Replaced the waitingObjects map and its associated waitingLock with a sync.Map.
- Defined a new namespacedObject struct to manage objects by namespace within each GVK entry in the sync.Map, including its own sync.RWMutex.
- Refactored isWaiting, AddWaiting, RemoveWaiting, and GetMatching functions to use the new sync.Map and namespacedObject for concurrent access.
- Updated GetMatching to return sets.Set[keys.ClusterWideKey] and introduced a helper match function for processing resource selectors.
- Modified HandlePropagationPolicyCreationOrUpdate and HandleClusterPropagationPolicyCreationOrUpdate to adapt to the sets.Set return type from GetMatching.

Activity

The pull request is currently marked as 'Work In Progress' (WIP).
No specific review comments or discussions have been recorded yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the waitingObjects data structure to use a sync.Map with more granular locking, aiming to reduce lock contention. However, the current implementation introduces a cross-namespace resource interference vulnerability and a Denial of Service risk. Specifically, the GetMatching function incorrectly handles empty namespaces in resource selectors, leading to unintended matching across namespaces for namespaced policies. Additionally, holding a read lock during potentially slow API server lookups can block other controller operations, contributing to the Denial of Service risk.

pkg/detector/detector.go

codecov-commenter · 2026-02-06T03:30:18Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.05%. Comparing base (4403fa4) to head (22cde24).

Files with missing lines	Patch %	Lines
pkg/detector/detector.go	0.00%	65 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7175      +/-   ##
==========================================
- Coverage   42.08%   42.05%   -0.04%     
==========================================
  Files         871      871              
  Lines       53301    53341      +40     
==========================================
- Hits        22432    22431       -1     
- Misses      29184    29224      +40     
- Partials     1685     1686       +1

Flag	Coverage Δ
unittests	`42.05% <0.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zhzhuang-zju · 2026-02-09T07:36:09Z

/retitle performance(detector): reduce lock contention in waitingObkects matching logic

performance: reduce lock contention in waitingObkects matching logic

22cde24

Signed-off-by: zhzhuang-zju <m17799853869@163.com>

karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 6, 2026

karmada-bot requested a review from Poor12 February 6, 2026 03:12

karmada-bot requested a review from XiShanYongYe-Chang February 6, 2026 03:12

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 6, 2026

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

pkg/detector/detector.go Show resolved Hide resolved

pkg/detector/detector.go Show resolved Hide resolved

karmada-bot changed the title ~~WIP: performance(detector): reduce lock contention in waitingObkects matching logic~~ performance(detector): reduce lock contention in waitingObkects matching logic Feb 9, 2026

karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

performance(detector): reduce lock contention in waitingObkects matching logic#7175

performance(detector): reduce lock contention in waitingObkects matching logic#7175
zhzhuang-zju wants to merge 1 commit intokarmada-io:masterfrom
zhzhuang-zju:waitinglock-segment

zhzhuang-zju commented Feb 6, 2026 •

edited

Loading

Uh oh!

karmada-bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 6, 2026

Uh oh!

zhzhuang-zju commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

zhzhuang-zju commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karmada-bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 6, 2026

Codecov Report

Uh oh!

zhzhuang-zju commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhzhuang-zju commented Feb 6, 2026 •

edited

Loading