fix: prevent crash when KnnSearch and Tune run concurrently by Copilot · Pull Request #1764 · antgroup/vsag

Copilot · 2026-03-27T06:33:37Z

Analyze merge conflicts in src/algorithm/hgraph.cpp
Resolve conflict 1: case 4 reorder logic — keep our new_use_reorder local variable approach (defers mutation until global lock)
Resolve conflict 2: global_mutex_ lock scope — keep our lock, incorporate main's param->use_reorder update inside it
Build and verify compilation (algorithm target builds successfully)
Commit proper merge commit with two parents

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

The crash occurred because Tune() modified basic_flatten_codes_, high_precise_codes_, and use_reorder_ under add_mutex_ while search methods (KnnSearch, SearchWithRequest) held shared global_mutex_ — these are different mutexes providing no synchronization. Changes: - In Tune(), defer member variable mutations using local variables, build new flatten codes first, then atomically swap all shared state under exclusive global_mutex_ lock - In RangeSearch(), add missing shared_lock(global_mutex_) for consistency with KnnSearch and SearchWithRequest Agent-Logs-Url: https://github.com/antgroup/vsag/sessions/ff0a358d-0694-4cb2-a40e-daf1830bd6f4 Co-authored-by: wxyucs <12595343+wxyucs@users.noreply.github.com>

Copilot

Pull request overview

Fixes a concurrency crash in VSAG’s HGraph index when Tune() runs concurrently with search operations by ensuring shared search state is swapped under the same mutex that protects readers.

Changes:

Update HGraph::Tune() to rebuild flatten codes off-lock and then atomically swap basic_flatten_codes_, high_precise_codes_, and use_reorder_ under an exclusive global_mutex_.
Add the missing shared_lock(global_mutex_) in HGraph::RangeSearch() to synchronize with Tune().

Copilot · 2026-03-27T08:19:57Z

src/algorithm/hgraph.cpp

+    // Acquire exclusive global lock to atomically swap flatten codes,
+    // preventing concurrent searches from accessing partially updated state.
+    {
+        std::scoped_lock<std::shared_mutex> wlock(this->global_mutex_);
+        basic_flatten_codes_ = new_basic;
+        if (drop_precise_codes) {
+            high_precise_codes_.reset();
+        } else {
+            high_precise_codes_ = new_precise;
+        }
+        use_reorder_ = new_use_reorder;

-    // set status
-    if (disable_future_tuning) {
-        this->index_feature_list_->SetFeature(IndexFeature::SUPPORT_TUNE, false);
-        this->raw_vector_.reset();
-        has_raw_vector_ = false;
-        create_new_raw_vector_ = false;
+        check_and_init_raw_vector(param->raw_vector_param, common_param, false);
+        init_resize_bit_and_reorder();


Tune now swaps basic_flatten_codes_/high_precise_codes_ under global_mutex_, but other public read paths (e.g., CalcDistanceById/CalDistanceById) access these members without taking global_mutex_. That means concurrent Tune() can still race with those APIs and potentially trigger the same use-after-free. Consider taking a std::shared_lock on global_mutex_ in those distance-by-id methods (and any other readers of these pointers), or otherwise ensuring the swapped state is read atomically everywhere.

Copilot · 2026-03-27T08:19:58Z

src/algorithm/hgraph.cpp

+    auto new_basic = tune_and_rebuild(is_tune_base_code, basic_flatten_codes_, new_basic_code);
+    auto new_precise =
        tune_and_rebuild(is_tune_precise_code, high_precise_codes_, new_precise_code);

-    check_and_init_raw_vector(param->raw_vector_param, common_param, false);
-    init_resize_bit_and_reorder();
+    // Acquire exclusive global lock to atomically swap flatten codes,
+    // preventing concurrent searches from accessing partially updated state.
+    {
+        std::scoped_lock<std::shared_mutex> wlock(this->global_mutex_);
+        basic_flatten_codes_ = new_basic;
+        if (drop_precise_codes) {
+            high_precise_codes_.reset();
+        } else {
+            high_precise_codes_ = new_precise;
+        }


This PR fixes a specific concurrent crash scenario (Tune() vs searches) but doesn't add a regression test that runs Tune() concurrently with KnnSearch()/RangeSearch() to ensure the race stays fixed. There are already concurrent tests for read/write in tests/test_hgraph.cpp, so adding a focused Tune-vs-search stress/regression test would help prevent future reintroductions.

Merged upstream's need_enable_reorder pattern into our concurrent-safe Tune() implementation. The key resolution keeps our global_mutex_ lock to atomically swap flatten codes while incorporating main's deferred use_reorder_ update (param->use_reorder = new_use_reorder) inside the lock scope. Co-authored-by: wxyucs <12595343+wxyucs@users.noreply.github.com>

Initial plan

8857f49

Copilot AI assigned Copilot and wxyucs Mar 27, 2026

pull-request-size bot added the size/XS label Mar 27, 2026

Copilot started work on behalf of wxyucs March 27, 2026 06:33 View session

wxyucs added kind/bug Something isn't working version/1.0 labels Mar 27, 2026

pull-request-size bot added size/M and removed size/XS labels Mar 27, 2026

Copilot AI changed the title ~~[WIP] Fix crash when KnnSearch and Tune concurrently~~ fix: prevent crash when KnnSearch and Tune run concurrently Mar 27, 2026

Copilot finished work on behalf of wxyucs March 27, 2026 07:04

Copilot AI requested a review from wxyucs March 27, 2026 07:04

wxyucs marked this pull request as ready for review March 27, 2026 08:16

wxyucs requested a review from LHT129 as a code owner March 27, 2026 08:16

Copilot AI review requested due to automatic review settings March 27, 2026 08:16

wxyucs requested review from inabao and jiaweizone as code owners March 27, 2026 08:16

Copilot started reviewing on behalf of wxyucs March 27, 2026 08:16 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Copilot started work on behalf of wxyucs April 3, 2026 08:57 View session

Copilot finished work on behalf of wxyucs April 3, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent crash when KnnSearch and Tune run concurrently#1764

fix: prevent crash when KnnSearch and Tune run concurrently#1764
Copilot wants to merge 3 commits intomainfrom
copilot/fix-crash-knnsearch-tune

Copilot AI commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 27, 2026 •

edited

Loading