Fix race conditions in gateway payload sending causing NotAuthenticated errors#139
Fix race conditions in gateway payload sending causing NotAuthenticated errors#139
Conversation
Co-authored-by: Quahu <24848652+Quahu@users.noreply.github.com>
…nditions Co-authored-by: Quahu <24848652+Quahu@users.noreply.github.com>
… handshake Co-authored-by: Quahu <24848652+Quahu@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request addresses race conditions in Discord gateway payload sending that cause NotAuthenticated errors when authenticated payloads (UpdatePresence, UpdateVoiceState, RequestMembers) are sent during handshake states (Identifying/Resuming). The fix implements state-aware payload validation and a double-check pattern to ensure payloads are only sent when the shard is in the appropriate state.
Changes:
- Added three helper methods to classify payload operations:
RequiresAuthentication(),IsHandshakePayload(), andCanSendPayload() - Implemented double-check pattern in
SendAsync()to prevent authenticated payloads from being sent during handshake states - Modified exception handling to prevent handshake payloads from retrying on recoverable WebSocket errors (which would deadlock)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Payloads that require authentication must wait for the Ready state before sending. | ||
| // This prevents sending them during reconnection or before initial authentication completes. | ||
| if (RequiresAuthentication(payload.Op)) | ||
| { | ||
| await WaitForReadyAsync(cancellationToken).ConfigureAwait(false); | ||
| } |
There was a problem hiding this comment.
The WaitForReadyAsync call here may hang indefinitely when resuming a session because the RESUMED event handler (line 345-351) sets the state to Ready but does not complete the _readyTcs. This means any authenticated payloads sent during a resume will wait forever.
The RESUMED handler should also complete the _readyTcs similar to how the READY handler does at line 330-331.
| { | ||
| // For non-handshake payloads (like UpdatePresence), wait for the gateway to reconnect and become ready again. | ||
| // Handshake payloads (Identify, Resume) should not retry here as they could deadlock waiting for Ready, | ||
| // since they are responsible for making the shard Ready in the first place. |
There was a problem hiding this comment.
The comment states "Handshake payloads (Identify, Resume) should not retry here as they could deadlock waiting for Ready" which is accurate. However, it would be helpful to also mention what happens instead: the exception propagates to the caller (IdentifyAsync or ResumeAsync) where the broader connection management logic handles reconnection.
| // since they are responsible for making the shard Ready in the first place. | |
| // since they are responsible for making the shard Ready in the first place. Instead, the exception is | |
| // allowed to propagate to the caller (e.g. IdentifyAsync or ResumeAsync), where the broader connection | |
| // management logic handles reconnection and initiating a new handshake. |
Discord gateway closes connections with
NotAuthenticatedwhen authenticated payloads (UpdatePresence, UpdateVoiceState, RequestMembers) are sent during Identify/Resume handshake states, causing loss of RESUMED event data.Changes
Added state-aware payload validation
CanSendPayload()method validates payload type against current shard stateImplemented double-check pattern for authenticated payloads
Fixed deadlock potential
WebSocketClosedException(would deadlock waiting for Ready state they're responsible for creating)Race Conditions Addressed
The double-check reduces the race window to nanoseconds (between final state check and network I/O), making it virtually impossible to send authenticated payloads during handshake states.