fix(stream-management): handle queue desync after page reload#1119
Open
mremond wants to merge 2 commits intoxmppjs:mainfrom
Open
fix(stream-management): handle queue desync after page reload#1119mremond wants to merge 2 commits intoxmppjs:mainfrom
mremond wants to merge 2 commits intoxmppjs:mainfrom
Conversation
When a page/app reloads, the outbound queue (outbound_q) is not persisted but the server maintains its acknowledgment counter. On session resume: 1. Server sends <resumed h="N"/> with the real counter value 2. Client tries to remove N items from an empty queue 3. shift() returns undefined, causing a crash on item.stanza This fix adds a guard that: - Detects when queue is empty but server reports higher h value - Resyncs the outbound counter to match server's value - Breaks out of the loop to prevent the crash Note: In-flight stanzas (sent but unacked at reload time) are still lost since the queue is not persisted. A proper fix would require serializing the queue to storage, which is beyond the scope of this patch.
Add two tests for the queue desync fix: 1. Empty queue with server h > 0 (page reload scenario) - Verifies no crash occurs when queue is empty - Verifies outbound counter resyncs to server's value 2. Partial queue with server h higher than queue length - Verifies existing items are properly acked - Verifies counter resyncs after queue is exhausted Also moves tests to test/ directory to match jest testMatch pattern.
Contributor
|
How are you resuming (to get resumed with any value) if not persisting the session data? |
Contributor
Author
|
Well, you always need to have the id to reattach. The point that in the case of failure that is triggered there, you have already lost the message AND are force to reconnect from scratch. |
Contributor
|
So you're storing the SM id but none of the other SM data? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a browser page or app reloads, the outbound queue (
outbound_q) is not persisted but the server maintains its acknowledgment counter. On session resume:<resumed h="N"/>with the real counter valueoutbound=0and emptyoutbound_q(fresh instance)ackQueue(N)tries to remove N items from empty queueshift()returnsundefined, causing crash onitem.stanzaSolution
Add a guard in
ackQueue()that:hvalueoutboundcounter to match server's valueTrade-offs
I am aware the proper fix would be to persist the outgoing queue to storage to avoid any potential message loss. However, this minimal fix:
In-flight stanzas (sent but unacked at reload time) would still be lost, but this is an edge case that would require significant additional complexity to handle properly (serializing XML elements to storage, handling duplicates if server did receive them, etc.).