Skip to content

Conversation

@arnaudmorin
Copy link

When having a connection error, like a PRECONDITION_FAILED because of missing replica of a stream queue in RabbitMQ, then we can safely cycle through the list of hosts given in the connection string instead of keep retrying on the same one.

When having a connection error, like a PRECONDITION_FAILED because of
missing replica of a stream queue in RabbitMQ, then we can safely cycle
through the list of hosts given in the connection string instead of keep
retrying on the same one.

Signed-off-by: Arnaud Morin <[email protected]>
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, can you add test?

@auvipy auvipy added this to the 5.7.0 milestone Jul 19, 2025
@auvipy auvipy requested a review from Copilot February 12, 2026 09:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Kombu’s connection retry behavior so that when an operation wrapped by Connection.ensure() encounters a recoverable connection error, the connection will rotate to the next configured host (failover URL) instead of repeatedly retrying the same host.

Changes:

  • Switch to the next failover host when ensure() catches a recoverable connection error during the wrapped operation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 562 to 565
except conn_errors as exc:
self.maybe_switch_next() # select next host
if got_connection and not has_modern_errors:
# transport can not distinguish between
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe_switch_next() is called before the early-exit raise paths (got_connection and not has_modern_errors and max_retries exceeded). That means callers can see the Connection mutated to the next host even though the operation ultimately fails and raises, which is a surprising side effect and can make diagnostics inconsistent (errback/logging will refer to the previous exception but self now points at a different host). Consider only switching hosts when you’re actually going to retry (e.g., move the switch after the raise checks and right before _ensure_connection).

Copilot uses AI. Check for mistakes.
self._debug('ensure retry policy error: %r',
exc, exc_info=1)
except conn_errors as exc:
self.maybe_switch_next() # select next host
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters retry/failover behavior in ensure() by switching to the next host when fun() raises a recoverable connection error, but there doesn’t appear to be a unit test covering that ensure() actually advances the host list in this scenario (as opposed to only _ensure_connection doing it). Adding a test that sets up a multi-host connection, forces a recoverable_connection_errors exception from fun(), and asserts the connection params switched would help prevent regressions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants