CBG-5056 close blip connection on panic #8023
Open
+18
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CBG-5056 close blip connection on panic
If there is a panic sending changes, the couchbase lite pull replications effectively hang, since couchbase lite will wait for all expected revids and stop sending new messages. It is not expected to ever panic, but having an open pull replication seems not ideal. In some places, we already close the connection, but this was not done for all replication go routines:
sync_gateway/db/blip_sync_context.go
Line 485 in 9a22b95
sync_gateway/db/blip_sync_context.go
Line 199 in 9a22b95
recoverin go-blipClosing this connection with
BlipSyncContext.Closewill result in the connection being closed with 1000 status code. In order to close with a specific error code modifying go-blip would be required couchbase/go-blip#78Couchbase Lite has untested code from the perspective of Sync Gateway that handles 4001 specially https://github.com/couchbase/couchbase-lite-core/blob/610fb32033c69da2aa8590374432a7a62de44502/Replicator/Replicator.cc#L519 but I am not sure how it behaves, but it also seems to reconnect on a 1000.
I tested this with and Android client introducing a panic like 1673d81 and the behavior of closing the connection is:
Defensively added recover and close to all replication goroutines I could find.
There isn't test code that exercises this because I could not determine how to instrument a panic into the code. I'm open to suggestions.
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/api