Skip to content

fix(api): stop emitting duplicate RUN_ERROR events on streaming failures#2667

Open
pullfrog[bot] wants to merge 1 commit intomainfrom
pullfrog/2666-fix-double-run-error
Open

fix(api): stop emitting duplicate RUN_ERROR events on streaming failures#2667
pullfrog[bot] wants to merge 1 commit intomainfrom
pullfrog/2666-fix-double-run-error

Conversation

@pullfrog
Copy link
Contributor

@pullfrog pullfrog bot commented Mar 22, 2026

Summary

Fixes #2666

When a streaming run fails, the client receives two RUN_ERROR SSE events instead of one. This PR eliminates the duplicate.

Root cause

executeRun() in v1.service.ts catches errors during streaming and:

  1. Emits a RUN_ERROR SSE event to the client
  2. Persists error state to the database
  3. Re-throws the error ← the problem

The controller's catch block then catches the re-thrown error and emits a second RUN_ERROR SSE event. This matches the reporter's observation:

data: {"type":"RUN_ERROR","message":"An internal error occurred","code":"INTERNAL_ERROR",...}
data: {"type":"RUN_ERROR","message":"An internal error occurred","code":"INTERNAL_ERROR",...}

Changes

apps/api/src/v1/v1.service.ts

  • Remove the re-throw from executeRun()'s catch block. The service now fully handles errors (emit SSE event + persist to DB) without bubbling to the controller.
  • Wrap DB cleanup in its own try/catch so a database failure during error cleanup can't propagate and trigger yet another error path. If the DB write fails, it's logged and reported to Sentry, but the RUN_ERROR event has already been sent to the client.

packages/backend/src/services/llm/ai-sdk-client.ts

  • Replace silent OpenAI fallback for unknown providers with explicit errors. bedrock and openrouter are defined in the Provider type but had no factory — they silently fell back to OpenAI, causing confusing LLM failures. Now they throw a clear "not yet supported" error.
  • Add exhaustive switch check so future Provider additions cause a compile-time error if not handled.

apps/api/src/v1/__tests__/v1.service.test.ts

  • Update two tests that expected executeRun to reject — it now resolves after handling errors internally.

Note on the underlying INTERNAL_ERROR

The duplicate RUN_ERROR is a confirmed code bug fixed in this PR. However, the underlying INTERNAL_ERROR that triggers it is a production environment issue (likely related to provider key configuration, model availability, or external API failures). The generic "An internal error occurred" message is intentionally opaque to avoid leaking internal details to clients — the full error is persisted to the database and Sentry for debugging by the Tambo team.

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

executeRun() was re-throwing errors after already emitting a RUN_ERROR
SSE event and persisting error state to the database. The controller's
catch block would then emit a second RUN_ERROR event — exactly matching
the duplicate error events reported in #2666.

Changes:
- Remove the re-throw from executeRun's catch block so errors are fully
  handled in the service (emit event + persist to DB) without bubbling
  to the controller
- Wrap the DB error-cleanup in its own try/catch so a database failure
  during cleanup cannot suppress the already-emitted RUN_ERROR event
- Replace silent fallback to OpenAI for unsupported providers (bedrock,
  openrouter) with an explicit error, and add exhaustive switch check

Fixes #2666
@vercel
Copy link

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cloud Ready Ready Preview, Comment Mar 22, 2026 9:53am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
showcase Skipped Skipped Mar 22, 2026 9:53am
tambo-docs Skipped Skipped Mar 22, 2026 9:53am

@github-actions github-actions bot added area: api Changes to the API (apps/api) area: backend Changes to the backend package (packages/backend) status: triage Needs to be triaged by a maintainer contributor: bot Created by a bot change: fix Bug fix labels Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: api Changes to the API (apps/api) area: backend Changes to the backend package (packages/backend) change: fix Bug fix contributor: bot Created by a bot status: triage Needs to be triaged by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

500 INTERNAL_ERROR on all /v1/threads/runs and /threads/{id}/generate-name endpoints

0 participants