Skip to content

fix(asyncio-cluster): await stale node teardown before replacing cluster nodes#283

Open
mkmkme wants to merge 3 commits intomainfrom
mkmkme/unclosed-clusternode-object
Open

fix(asyncio-cluster): await stale node teardown before replacing cluster nodes#283
mkmkme wants to merge 3 commits intomainfrom
mkmkme/unclosed-clusternode-object

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented Mar 21, 2026

Avoid fire-and-forget disconnects when refreshing the async cluster topology.
This keeps removed ClusterNode instances alive until teardown finishes and fixes the unclosed node warning regression during reinitialization.

Fixes #185

Pull Request check-list

  • Do tests and lints pass with this change?
  • Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • Is there an example added to the examples folder (if applicable)?

Description of change

@mkmkme mkmkme added the bugfix label Mar 21, 2026
@mkmkme mkmkme marked this pull request as ready for review March 21, 2026 13:39
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 98.94737% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.45%. Comparing base (5c9a506) to head (80e7f03).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
valkey/asyncio/cluster.py 95.45% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #283      +/-   ##
==========================================
+ Coverage   76.41%   76.45%   +0.03%     
==========================================
  Files         129      129              
  Lines       34065    34158      +93     
==========================================
+ Hits        26031    26114      +83     
- Misses       8034     8044      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an asyncio cluster reinitialization regression by ensuring stale ClusterNode instances are properly disconnected (and awaited) before being dropped/replaced, preventing Unclosed ClusterNode object warnings during topology refresh.

Changes:

  • Remove unnecessary asyncio.create_task(...) wrapping around asyncio.gather(...) calls to avoid fire-and-forget behavior.
  • Track stale nodes slated for teardown in NodesManager and await their disconnects during initialize() and aclose().
  • Add a regression test asserting initialize() waits for removed-node disconnect before completing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
valkey/asyncio/cluster.py Introduces pending-node teardown tracking and awaits disconnects during topology refresh and close; simplifies gather usage.
tests/test_asyncio/test_cluster.py Adds a new test case and helper connection to verify initialization blocks until stale node teardown finishes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 28, 2026

@copilot review

Copy link
Copy Markdown

Copilot AI commented Mar 28, 2026

@copilot review

All three review suggestions have been addressed in commit 2ccc88a:

  1. Retry on disconnect failure: _close_pending_node_disconnects now uses return_exceptions=True, keeps failed nodes in _pending_node_disconnects for retry on the next call or during aclose(), and only removes entries for successful disconnects.

  2. No exception propagation: Disconnect errors emit a RuntimeWarning instead of aborting initialize() or topology refresh. A new test (test_initialize_warns_and_retries_stale_disconnect_errors) covers this path end-to-end, including the retry via aclose().

  3. Deterministic test: The test now awaits disconnect_started_event (set at the start of disconnect()) instead of asyncio.sleep(0), ensuring initialize_task is genuinely blocked on the disconnect await before the assertion runs.

mkmkme added 3 commits March 28, 2026 08:40
…y valkey.asyncio.cluster.ClusterNode.__del__()

Signed-off-by: Mikhail Koviazin <[email protected]>
Signed-off-by: Mikhail Koviazin <[email protected]>
@mkmkme mkmkme force-pushed the mkmkme/unclosed-clusternode-object branch from 2ccc88a to 80e7f03 Compare March 28, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unclosed ClusterNode object error triggered by valkey.asyncio.cluster.ClusterNode.__del__()

4 participants