Skip to content

fix(client): suppress onerror when SSE disconnect will be handled by reconnect#1827

Open
felixweinberger wants to merge 5 commits intomainfrom
fweinberger/fix-sse-disconnect-noise
Open

fix(client): suppress onerror when SSE disconnect will be handled by reconnect#1827
felixweinberger wants to merge 5 commits intomainfrom
fweinberger/fix-sse-disconnect-noise

Conversation

@felixweinberger
Copy link
Copy Markdown
Contributor

Moves onerror into the else branch of the reconnect check in _handleSseStream's catch block.

Motivation and Context

Fixes #1211. onerror fired unconditionally on every stream disconnect, before the reconnect decision at streamableHttp.ts:447. On long-lived GET SSE streams this produced "SSE stream disconnected: TypeError: terminated" every few minutes even though the transport recovered transparently via _scheduleReconnection.

Three users confirmed on #1211 (original report on 1.24.3, playwright-mcp user with screenshot, status-ask). #1349 attempted a fix via string-matching 'terminated' / 'body stream', but that's fragile (undici could change messages, and 'body stream' also matches 'body stream already read' which is a different bug). This takes the structural approach instead: only fire onerror when reconnection won't happen.

How Has This Been Tested?

Updated the existing "should reconnect a GET-initiated notification stream that fails" test to assert onerror is NOT called when reconnect fires (previously asserted the buggy behavior). Test fails on main (expected not called, but called 1 times), passes with the fix. All 346 client tests pass.

Breaking Changes

Behavioral change for users who relied on onerror as a connection-activity monitor: transient disconnects that will be handled by reconnection no longer fire onerror. Permanent failures ("Failed to reconnect", "Maximum reconnection attempts exceeded") still fire.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Credit to @hassan123789 for #1349 which identified the issue and the code location.

…reconnect

Previously onerror fired unconditionally on every stream disconnect, before
the reconnect decision. This produced 'SSE stream disconnected: TypeError:
terminated' noise every few minutes on long-lived connections even though
the transport recovered transparently. Moved onerror into the else branch
of the reconnect check so it only fires when the stream will not recover.

Fixes #1211
@felixweinberger felixweinberger requested a review from a team as a code owner March 31, 2026 13:23
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 31, 2026

🦋 Changeset detected

Latest commit: bb9eef8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@modelcontextprotocol/client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 31, 2026

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/@modelcontextprotocol/client@1827

@modelcontextprotocol/server

npm i https://pkg.pr.new/@modelcontextprotocol/server@1827

@modelcontextprotocol/express

npm i https://pkg.pr.new/@modelcontextprotocol/express@1827

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/@modelcontextprotocol/fastify@1827

@modelcontextprotocol/hono

npm i https://pkg.pr.new/@modelcontextprotocol/hono@1827

@modelcontextprotocol/node

npm i https://pkg.pr.new/@modelcontextprotocol/node@1827

commit: bb9eef8

…Retries

- Thread lastError through _scheduleReconnection so the original disconnect
  error is surfaced (as message suffix and Error.cause) when maxRetries is
  exceeded. With maxRetries=0 the previous behavior dropped the diagnostic
  entirely.
- Align SSE-disconnected onerror message with adjacent pattern to avoid the
  doubled 'Error:' prefix from string-coercing an Error instance.
- Add test covering the maxRetries=0 + lastError surfacing path.
Comment on lines +338 to +344
const reason = lastError === undefined ? undefined : lastError instanceof Error ? lastError.message : String(lastError);
this.onerror?.(
new Error(
`Maximum reconnection attempts (${maxRetries}) exceeded` + (reason ? `: ${reason}` : '.'),
lastError === undefined ? undefined : { cause: lastError }
)
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 When lastError is an Error with an empty message (e.g. new Error("")), the triggering error is silently omitted from the "Maximum reconnection attempts exceeded" message due to a falsy check on reason. Change reason ? to reason !== undefined ? so that empty-string reasons still produce a colon suffix that signals a cause exists.

Extended reasoning...

What the bug is and how it manifests

In _scheduleReconnection, the PR introduces logic to surface the triggering error in the max-retries message at lines 338-344. When lastError is new Error(""), reason evaluates to "" — an empty string, which is falsy in JavaScript. The ternary (reason ? ': ' + reason : '.') then falls through to the period branch, producing the exact same message as the no-error case: "Maximum reconnection attempts (N) exceeded." The error.cause property is still set correctly, but callers who only inspect error.message (the common pattern) receive no indication that a triggering error was present.

The specific code path that triggers it

The relevant lines are 338-344 in packages/client/src/client/streamableHttp.ts. The reason variable is computed as: lastError === undefined ? undefined : lastError instanceof Error ? lastError.message : String(lastError). For new Error(""), this yields "". The ternary check (reason ? ...) treats this identically to undefined. The catch block in _handleSseStream passes error directly to _scheduleReconnection as the third argument, so any network throw producing an empty-message Error would trigger the silent drop.

Why existing code doesn't prevent it

The truthiness check reason ? is designed to detect "no reason" (i.e., lastError === undefined), but conflates two distinct states: (1) reason === undefined meaning no error was provided, and (2) reason === "" meaning an error was provided but has an empty message. Both are falsy, so both produce the same output. A strict undefined check (reason !== undefined) would correctly distinguish them.

What the impact would be

Users who only read error.message — the common pattern in logging, error display, and monitoring — would see "Maximum reconnection attempts (N) exceeded." for an empty-message error, identical to the case where no error occurred at all. They lose all indication that a root-cause error exists and must inspect error.cause to find it. In practice, real network errors (AbortError, TypeError: terminated, etc.) all have non-empty messages, so the practical impact is near-zero. The cause is preserved for programmatic inspection.

Step-by-step proof

  1. lastError = new Error("") is passed to _scheduleReconnection.
  2. reason = lastError instanceof Error ? lastError.message : String(lastError) evaluates to "".
  3. (reason ? ': ' + reason : '.') — "" is falsy — evaluates to ".".
  4. Final message: "Maximum reconnection attempts (0) exceeded."
  5. Meanwhile, lastError === undefined is false, so { cause: lastError } is passed and error.cause is correctly set.
  6. The message is indistinguishable from the no-error case; cause exists but is invisible to message-only consumers.

How to fix it

Replace the truthiness check with an explicit undefined check: (reason !== undefined ? ': ' + reason : '.'). This produces "Maximum reconnection attempts (0) exceeded: " for empty-message errors — the trailing colon at minimum signals to readers that a cause was present — while still appending "." only when no error was provided at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty-message Errors don't occur on the network paths feeding this, and cause is preserved for programmatic access. The suggested output (exceeded: with trailing colon-space) reads as a formatting bug.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both prior 🔴 issues have been addressed: the triggering error is now threaded through to the max-retries message, and the AbortError guard on the else branch prevents onerror from firing after intentional close(). The remaining open items are minor (a stale JSDoc @param and an empty-string edge case where error.cause is still correctly set). LGTM.

Extended reasoning...

Overview

This PR fixes a long-standing UX problem in the StreamableHTTP client transport: onerror fired unconditionally on every SSE stream disconnect, producing noise every few minutes on long-lived connections even when the transport recovered transparently via reconnection. The fix moves the onerror call into an else branch that only fires when reconnection will not happen.

The diff touches three files: the changeset entry, the transport implementation (streamableHttp.ts), and its test suite. The core change is small and surgical — restructuring the catch block in _handleSseStream and adding a lastError parameter to _scheduleReconnection so the root-cause error can be surfaced when max retries is exceeded.

Prior review feedback — current status

My April 1 reviews raised two bugs (🔴 original error silently dropped at maxRetries=0; 🟡 doubled Error: prefix). Commit 7ce0a21 addressed both: the lastError parameter threads the root-cause through, and the else branch now uses the error instanceof Error ? error.message : String(error) pattern.

My April 2 reviews raised two more (🔴 close() firing onerror after onclose; 🟡 empty-string reason falsy check). Commit bb9eef8 added the !this._abortController?.signal.aborted guard to the else branch, fixing the 🔴. The 🟡 (reason ? vs reason !== undefined ?) remains but is inconsequential in practice — real network errors always have non-empty messages, and error.cause is correctly set regardless.

Security risks

No security-sensitive code is touched. This is entirely error-notification plumbing with no auth, crypto, or permission changes.

Level of scrutiny

This is transport-layer error-handling code that affects developer experience (spurious error log noise) rather than correctness or security. The fix is structurally sound — conditioning onerror on a boolean already computed for the reconnect decision — and is validated by an updated test that asserts the previous buggy behavior no longer occurs. Appropriate for a patch release.

Other factors

All 346 client tests pass. The new test for maxRetries=0 explicitly covers the triggering-error surface path. The stale @param lastEventId JSDoc nit from this review pass is accurate but has zero runtime impact.

Comment on lines 328 to +332
* @param lastEventId The ID of the last received event for resumability
* @param attemptCount Current reconnection attempt count for this specific stream
* @param lastError The error that triggered this reconnection attempt, surfaced if max retries is exceeded
*/
private _scheduleReconnection(options: StartSSEOptions, attemptCount = 0): void {
private _scheduleReconnection(options: StartSSEOptions, attemptCount = 0, lastError?: unknown): void {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The JSDoc for _scheduleReconnection contains a stale @param lastEventId that no longer matches any parameter — the actual first parameter is options: StartSSEOptions. This is a pre-existing inaccuracy, but this PR directly modified the same JSDoc block by adding @param lastError without removing or correcting the stale entry. IDE hover-text and future contributors will see incorrect parameter documentation for this method.

Extended reasoning...

What the bug is and how it manifests

The JSDoc comment block for _scheduleReconnection contains @param lastEventId The ID of the last received event for resumability, but the actual first parameter of the method is options: StartSSEOptions. There is no lastEventId parameter in the current signature. IDEs will surface this stale tag as hover-text whenever a developer inspects a call site, and any generated documentation tooling that processes private methods will also show the incorrect name.

The specific code path that triggers it

The method signature after this PR is:

private _scheduleReconnection(options: StartSSEOptions, attemptCount = 0, lastError?: unknown): void

The JSDoc still documents three params: lastEventId (wrong), attemptCount (correct), and the newly added lastError (correct). The first tag is a fossil from an earlier version of the method when the first argument may have been a lastEventId string.

Why existing code does not prevent it

TypeScript does not enforce that JSDoc @param names match the actual parameter names for private methods (and even for public ones, only some strict-doc linters catch this). There is nothing in the build or CI pipeline that would flag the mismatch.

What the impact would be

Developers reading the code or hovering over a call to _scheduleReconnection in an IDE will see lastEventId listed as the first parameter, which no longer exists. This is misleading and increases cognitive overhead when tracing the reconnection logic — exactly the part of the codebase this PR is modifying. The impact is purely documentation; runtime behavior is unaffected.

How to fix it

Remove the stale @param lastEventId ... line and replace it with the correct tag:

@param options SSE start options (resumption token, callbacks) forwarded to _startOrAuthSse on each attempt

Step-by-step proof

  1. Before this PR, _scheduleReconnection accepted options: StartSSEOptions as its first parameter — the @param lastEventId tag was already wrong.
  2. This PR adds @param lastError to the same block (the diff shows + * @param lastError ...) without touching or removing @param lastEventId.
  3. The resulting JSDoc block now documents three @param tags — lastEventId, attemptCount, lastError — while the actual signature has options, attemptCount, lastError.
  4. Any developer hovering over this._scheduleReconnection({...}, 0, error) in VSCode will see lastEventId as the first parameter description, which is incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SSE stream disconnected: TypeError: terminated

1 participant