Skip to content

Conversation

@torcolvin
Copy link
Collaborator

CBG-5056 close blip connection on panic

If there is a panic sending changes, the couchbase lite pull replications effectively hang, since couchbase lite will wait for all expected revids and stop sending new messages. It is not expected to ever panic, but having an open pull replication seems not ideal. In some places, we already close the connection, but this was not done for all replication go routines:


implicitly closes the connection since this gets caught by the recover in go-blip

Closing this connection with BlipSyncContext.Close will result in the connection being closed with 1000 status code. In order to close with a specific error code modifying go-blip would be required couchbase/go-blip#78

Couchbase Lite has untested code from the perspective of Sync Gateway that handles 4001 specially https://github.com/couchbase/couchbase-lite-core/blob/610fb32033c69da2aa8590374432a7a62de44502/Replicator/Replicator.cc#L519 but I am not sure how it behaves, but it also seems to reconnect on a 1000.

I tested this with and Android client introducing a panic like 1673d81 and the behavior of closing the connection is:

  • If continuous replication, the connection will reconnect automatically.
  • If a non continuous replication, the connection will not reconnect automatically.

Defensively added recover and close to all replication goroutines I could find.

There isn't test code that exercises this because I could not determine how to instrument a panic into the code. I'm open to suggestions.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Copilot AI review requested due to automatic review settings January 23, 2026 17:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds defensive panic recovery to BLIP replication goroutines to prevent hanging pull replications when unexpected panics occur during changes processing.

Changes:

  • Added panic recovery handlers that close the BLIP connection across multiple replication code paths
  • Enhanced warning messages to indicate that replication is being canceled
  • Ensures consistent handling of panics in both standard and ISGR (Inter-Sync Gateway Replication) paths

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
db/blip_sync_context.go Added connection close on panic in handleChangesResponse and improved warning message in sendRevisionWithProperties
db/blip_handler.go Added panic recovery with connection close in handleSubChanges goroutine
db/active_replicator_push.go Added panic recovery with connection close in ISGR _startSendingChanges goroutine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants