fix(client): suppress onerror when SSE disconnect will be handled by reconnect#1827
fix(client): suppress onerror when SSE disconnect will be handled by reconnect#1827felixweinberger wants to merge 5 commits intomainfrom
Conversation
…reconnect Previously onerror fired unconditionally on every stream disconnect, before the reconnect decision. This produced 'SSE stream disconnected: TypeError: terminated' noise every few minutes on long-lived connections even though the transport recovered transparently. Moved onerror into the else branch of the reconnect check so it only fires when the stream will not recover. Fixes #1211
🦋 Changeset detectedLatest commit: bb9eef8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@modelcontextprotocol/client
@modelcontextprotocol/server
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
…Retries - Thread lastError through _scheduleReconnection so the original disconnect error is surfaced (as message suffix and Error.cause) when maxRetries is exceeded. With maxRetries=0 the previous behavior dropped the diagnostic entirely. - Align SSE-disconnected onerror message with adjacent pattern to avoid the doubled 'Error:' prefix from string-coercing an Error instance. - Add test covering the maxRetries=0 + lastError surfacing path.
| const reason = lastError === undefined ? undefined : lastError instanceof Error ? lastError.message : String(lastError); | ||
| this.onerror?.( | ||
| new Error( | ||
| `Maximum reconnection attempts (${maxRetries}) exceeded` + (reason ? `: ${reason}` : '.'), | ||
| lastError === undefined ? undefined : { cause: lastError } | ||
| ) | ||
| ); |
There was a problem hiding this comment.
🟡 When lastError is an Error with an empty message (e.g. new Error("")), the triggering error is silently omitted from the "Maximum reconnection attempts exceeded" message due to a falsy check on reason. Change reason ? to reason !== undefined ? so that empty-string reasons still produce a colon suffix that signals a cause exists.
Extended reasoning...
What the bug is and how it manifests
In _scheduleReconnection, the PR introduces logic to surface the triggering error in the max-retries message at lines 338-344. When lastError is new Error(""), reason evaluates to "" — an empty string, which is falsy in JavaScript. The ternary (reason ? ': ' + reason : '.') then falls through to the period branch, producing the exact same message as the no-error case: "Maximum reconnection attempts (N) exceeded." The error.cause property is still set correctly, but callers who only inspect error.message (the common pattern) receive no indication that a triggering error was present.
The specific code path that triggers it
The relevant lines are 338-344 in packages/client/src/client/streamableHttp.ts. The reason variable is computed as: lastError === undefined ? undefined : lastError instanceof Error ? lastError.message : String(lastError). For new Error(""), this yields "". The ternary check (reason ? ...) treats this identically to undefined. The catch block in _handleSseStream passes error directly to _scheduleReconnection as the third argument, so any network throw producing an empty-message Error would trigger the silent drop.
Why existing code doesn't prevent it
The truthiness check reason ? is designed to detect "no reason" (i.e., lastError === undefined), but conflates two distinct states: (1) reason === undefined meaning no error was provided, and (2) reason === "" meaning an error was provided but has an empty message. Both are falsy, so both produce the same output. A strict undefined check (reason !== undefined) would correctly distinguish them.
What the impact would be
Users who only read error.message — the common pattern in logging, error display, and monitoring — would see "Maximum reconnection attempts (N) exceeded." for an empty-message error, identical to the case where no error occurred at all. They lose all indication that a root-cause error exists and must inspect error.cause to find it. In practice, real network errors (AbortError, TypeError: terminated, etc.) all have non-empty messages, so the practical impact is near-zero. The cause is preserved for programmatic inspection.
Step-by-step proof
- lastError = new Error("") is passed to _scheduleReconnection.
- reason = lastError instanceof Error ? lastError.message : String(lastError) evaluates to "".
- (reason ? ': ' + reason : '.') — "" is falsy — evaluates to ".".
- Final message: "Maximum reconnection attempts (0) exceeded."
- Meanwhile, lastError === undefined is false, so { cause: lastError } is passed and error.cause is correctly set.
- The message is indistinguishable from the no-error case; cause exists but is invisible to message-only consumers.
How to fix it
Replace the truthiness check with an explicit undefined check: (reason !== undefined ? ': ' + reason : '.'). This produces "Maximum reconnection attempts (0) exceeded: " for empty-message errors — the trailing colon at minimum signals to readers that a cause was present — while still appending "." only when no error was provided at all.
There was a problem hiding this comment.
Empty-message Errors don't occur on the network paths feeding this, and cause is preserved for programmatic access. The suggested output (exceeded: with trailing colon-space) reads as a formatting bug.
There was a problem hiding this comment.
Both prior 🔴 issues have been addressed: the triggering error is now threaded through to the max-retries message, and the AbortError guard on the else branch prevents onerror from firing after intentional close(). The remaining open items are minor (a stale JSDoc @param and an empty-string edge case where error.cause is still correctly set). LGTM.
Extended reasoning...
Overview
This PR fixes a long-standing UX problem in the StreamableHTTP client transport: onerror fired unconditionally on every SSE stream disconnect, producing noise every few minutes on long-lived connections even when the transport recovered transparently via reconnection. The fix moves the onerror call into an else branch that only fires when reconnection will not happen.
The diff touches three files: the changeset entry, the transport implementation (streamableHttp.ts), and its test suite. The core change is small and surgical — restructuring the catch block in _handleSseStream and adding a lastError parameter to _scheduleReconnection so the root-cause error can be surfaced when max retries is exceeded.
Prior review feedback — current status
My April 1 reviews raised two bugs (🔴 original error silently dropped at maxRetries=0; 🟡 doubled Error: prefix). Commit 7ce0a21 addressed both: the lastError parameter threads the root-cause through, and the else branch now uses the error instanceof Error ? error.message : String(error) pattern.
My April 2 reviews raised two more (🔴 close() firing onerror after onclose; 🟡 empty-string reason falsy check). Commit bb9eef8 added the !this._abortController?.signal.aborted guard to the else branch, fixing the 🔴. The 🟡 (reason ? vs reason !== undefined ?) remains but is inconsequential in practice — real network errors always have non-empty messages, and error.cause is correctly set regardless.
Security risks
No security-sensitive code is touched. This is entirely error-notification plumbing with no auth, crypto, or permission changes.
Level of scrutiny
This is transport-layer error-handling code that affects developer experience (spurious error log noise) rather than correctness or security. The fix is structurally sound — conditioning onerror on a boolean already computed for the reconnect decision — and is validated by an updated test that asserts the previous buggy behavior no longer occurs. Appropriate for a patch release.
Other factors
All 346 client tests pass. The new test for maxRetries=0 explicitly covers the triggering-error surface path. The stale @param lastEventId JSDoc nit from this review pass is accurate but has zero runtime impact.
| * @param lastEventId The ID of the last received event for resumability | ||
| * @param attemptCount Current reconnection attempt count for this specific stream | ||
| * @param lastError The error that triggered this reconnection attempt, surfaced if max retries is exceeded | ||
| */ | ||
| private _scheduleReconnection(options: StartSSEOptions, attemptCount = 0): void { | ||
| private _scheduleReconnection(options: StartSSEOptions, attemptCount = 0, lastError?: unknown): void { |
There was a problem hiding this comment.
🟡 The JSDoc for _scheduleReconnection contains a stale @param lastEventId that no longer matches any parameter — the actual first parameter is options: StartSSEOptions. This is a pre-existing inaccuracy, but this PR directly modified the same JSDoc block by adding @param lastError without removing or correcting the stale entry. IDE hover-text and future contributors will see incorrect parameter documentation for this method.
Extended reasoning...
What the bug is and how it manifests
The JSDoc comment block for _scheduleReconnection contains @param lastEventId The ID of the last received event for resumability, but the actual first parameter of the method is options: StartSSEOptions. There is no lastEventId parameter in the current signature. IDEs will surface this stale tag as hover-text whenever a developer inspects a call site, and any generated documentation tooling that processes private methods will also show the incorrect name.
The specific code path that triggers it
The method signature after this PR is:
private _scheduleReconnection(options: StartSSEOptions, attemptCount = 0, lastError?: unknown): voidThe JSDoc still documents three params: lastEventId (wrong), attemptCount (correct), and the newly added lastError (correct). The first tag is a fossil from an earlier version of the method when the first argument may have been a lastEventId string.
Why existing code does not prevent it
TypeScript does not enforce that JSDoc @param names match the actual parameter names for private methods (and even for public ones, only some strict-doc linters catch this). There is nothing in the build or CI pipeline that would flag the mismatch.
What the impact would be
Developers reading the code or hovering over a call to _scheduleReconnection in an IDE will see lastEventId listed as the first parameter, which no longer exists. This is misleading and increases cognitive overhead when tracing the reconnection logic — exactly the part of the codebase this PR is modifying. The impact is purely documentation; runtime behavior is unaffected.
How to fix it
Remove the stale @param lastEventId ... line and replace it with the correct tag:
@param options SSE start options (resumption token, callbacks) forwarded to _startOrAuthSse on each attempt
Step-by-step proof
- Before this PR,
_scheduleReconnectionacceptedoptions: StartSSEOptionsas its first parameter — the@param lastEventIdtag was already wrong. - This PR adds
@param lastErrorto the same block (the diff shows+ * @param lastError ...) without touching or removing@param lastEventId. - The resulting JSDoc block now documents three
@paramtags —lastEventId,attemptCount,lastError— while the actual signature hasoptions,attemptCount,lastError. - Any developer hovering over
this._scheduleReconnection({...}, 0, error)in VSCode will seelastEventIdas the first parameter description, which is incorrect.
Moves
onerrorinto the else branch of the reconnect check in_handleSseStream's catch block.Motivation and Context
Fixes #1211.
onerrorfired unconditionally on every stream disconnect, before the reconnect decision at streamableHttp.ts:447. On long-lived GET SSE streams this produced"SSE stream disconnected: TypeError: terminated"every few minutes even though the transport recovered transparently via_scheduleReconnection.Three users confirmed on #1211 (original report on 1.24.3, playwright-mcp user with screenshot, status-ask). #1349 attempted a fix via string-matching
'terminated'/'body stream', but that's fragile (undici could change messages, and'body stream'also matches'body stream already read'which is a different bug). This takes the structural approach instead: only fireonerrorwhen reconnection won't happen.How Has This Been Tested?
Updated the existing
"should reconnect a GET-initiated notification stream that fails"test to assertonerroris NOT called when reconnect fires (previously asserted the buggy behavior). Test fails on main (expected not called, but called 1 times), passes with the fix. All 346 client tests pass.Breaking Changes
Behavioral change for users who relied on
onerroras a connection-activity monitor: transient disconnects that will be handled by reconnection no longer fireonerror. Permanent failures ("Failed to reconnect","Maximum reconnection attempts exceeded") still fire.Types of changes
Checklist
Additional context
Credit to @hassan123789 for #1349 which identified the issue and the code location.