Skip to content

fix(stream-management): handle queue desync after page reload#1119

Open
mremond wants to merge 2 commits intoxmppjs:mainfrom
mremond:fix/stream-management-queue-desync
Open

fix(stream-management): handle queue desync after page reload#1119
mremond wants to merge 2 commits intoxmppjs:mainfrom
mremond:fix/stream-management-queue-desync

Conversation

@mremond
Copy link
Contributor

@mremond mremond commented Feb 7, 2026

Problem

When a browser page or app reloads, the outbound queue (outbound_q) is not persisted but the server maintains its acknowledgment counter. On session resume:

  1. Server sends <resumed h="N"/> with the real counter value
  2. Client has outbound=0 and empty outbound_q (fresh instance)
  3. ackQueue(N) tries to remove N items from empty queue
  4. shift() returns undefined, causing crash on item.stanza

Solution

Add a guard in ackQueue() that:

  • Detects when queue is empty but server reports higher h value
  • Resyncs the outbound counter to match server's value
  • Breaks out of the loop to prevent the crash

Trade-offs

I am aware the proper fix would be to persist the outgoing queue to storage to avoid any potential message loss. However, this minimal fix:

  • Works without message loss in the vast majority of cases (the window between send and ack is typically milliseconds)
  • Avoids having to micro-manage the session store with queue serialization
  • Keeps the fix simple and non-invasive
  • Properly resyncs counters to prevent future desyncs

In-flight stanzas (sent but unacked at reload time) would still be lost, but this is an edge case that would require significant additional complexity to handle properly (serializing XML elements to storage, handling duplicates if server did receive them, etc.).

When a page/app reloads, the outbound queue (outbound_q) is not persisted
but the server maintains its acknowledgment counter. On session resume:

1. Server sends <resumed h="N"/> with the real counter value
2. Client tries to remove N items from an empty queue
3. shift() returns undefined, causing a crash on item.stanza

This fix adds a guard that:
- Detects when queue is empty but server reports higher h value
- Resyncs the outbound counter to match server's value
- Breaks out of the loop to prevent the crash

Note: In-flight stanzas (sent but unacked at reload time) are still lost
since the queue is not persisted. A proper fix would require serializing
the queue to storage, which is beyond the scope of this patch.
Add two tests for the queue desync fix:

1. Empty queue with server h > 0 (page reload scenario)
   - Verifies no crash occurs when queue is empty
   - Verifies outbound counter resyncs to server's value

2. Partial queue with server h higher than queue length
   - Verifies existing items are properly acked
   - Verifies counter resyncs after queue is exhausted

Also moves tests to test/ directory to match jest testMatch pattern.
@singpolyma
Copy link
Contributor

How are you resuming (to get resumed with any value) if not persisting the session data?

@mremond
Copy link
Contributor Author

mremond commented Mar 12, 2026

Well, you always need to have the id to reattach. The point that in the case of failure that is triggered there, you have already lost the message AND are force to reconnect from scratch.

@singpolyma
Copy link
Contributor

So you're storing the SM id but none of the other SM data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants