-
-
Notifications
You must be signed in to change notification settings - Fork 981
Switch to the next host on connection error #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When having a connection error, like a PRECONDITION_FAILED because of missing replica of a stream queue in RabbitMQ, then we can safely cycle through the list of hosts given in the connection string instead of keep retrying on the same one. Signed-off-by: Arnaud Morin <[email protected]>
auvipy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, can you add test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates Kombu’s connection retry behavior so that when an operation wrapped by Connection.ensure() encounters a recoverable connection error, the connection will rotate to the next configured host (failover URL) instead of repeatedly retrying the same host.
Changes:
- Switch to the next failover host when
ensure()catches a recoverable connection error during the wrapped operation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except conn_errors as exc: | ||
| self.maybe_switch_next() # select next host | ||
| if got_connection and not has_modern_errors: | ||
| # transport can not distinguish between |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe_switch_next() is called before the early-exit raise paths (got_connection and not has_modern_errors and max_retries exceeded). That means callers can see the Connection mutated to the next host even though the operation ultimately fails and raises, which is a surprising side effect and can make diagnostics inconsistent (errback/logging will refer to the previous exception but self now points at a different host). Consider only switching hosts when you’re actually going to retry (e.g., move the switch after the raise checks and right before _ensure_connection).
| self._debug('ensure retry policy error: %r', | ||
| exc, exc_info=1) | ||
| except conn_errors as exc: | ||
| self.maybe_switch_next() # select next host |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change alters retry/failover behavior in ensure() by switching to the next host when fun() raises a recoverable connection error, but there doesn’t appear to be a unit test covering that ensure() actually advances the host list in this scenario (as opposed to only _ensure_connection doing it). Adding a test that sets up a multi-host connection, forces a recoverable_connection_errors exception from fun(), and asserts the connection params switched would help prevent regressions.
When having a connection error, like a PRECONDITION_FAILED because of missing replica of a stream queue in RabbitMQ, then we can safely cycle through the list of hosts given in the connection string instead of keep retrying on the same one.