Skip to content

fix: [BUG] no partition of relation "v1_payload" found for row #2803#3328

Open
Akhilesh29 wants to merge 2 commits intohatchet-dev:mainfrom
Akhilesh29:main
Open

fix: [BUG] no partition of relation "v1_payload" found for row #2803#3328
Akhilesh29 wants to merge 2 commits intohatchet-dev:mainfrom
Akhilesh29:main

Conversation

@Akhilesh29
Copy link

@Akhilesh29 Akhilesh29 commented Mar 19, 2026

Description

Fixes #2803

When replaying a task whose inserted_at falls outside all existing v1_payload partition boundaries (typically because the task is older than the retention window), Postgres throws SQLSTATE 23514 — no partition of relation "v1_payload" found for row. Previously this error bubbled all the way back to the RabbitMQ consumer which treated it as a transient failure and retried indefinitely, flooding the logs.

This fix catches the partition error at two points in the replay path and skips gracefully with a warning log instead of returning an error that triggers endless retries.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

What's Changed

  • In pkg/repository/v1/task.go — catch SQLSTATE 23514 in replayTasks() after payloadStore.Store, log a warning and continue instead of returning the error
  • In internal/services/controllers/v1/task_controller.go — catch SQLSTATE 23514 in handleReplayTasks() after ReplayTasks(), log a warning and continue instead of returning the error which triggered infinite RabbitMQ retries
  • Add shared isPostgresPartitionError helper in both files to detect the partition constraint violation by SQLSTATE code

Note

Medium Risk
Changes error handling in the task replay path to swallow specific Postgres partition failures; incorrect detection or overly-broad matching could hide real replay errors or skip legitimate replays. Also introduces new imports/helpers that must compile correctly across packages.

Overview
Prevents infinite RabbitMQ retries when replaying tasks that fall outside existing v1_payload partitions by catching Postgres 23514 partition errors and skipping replay with a warning.

This adds isPostgresPartitionError checks in both the task controller (handleReplayTasks) and repository replay flow (after payloadStore.Store) so old tasks that can’t be persisted due to missing partitions don’t bubble up as consumer errors.

Written by Cursor Bugbot for commit 773e53c. This will update automatically on new commits. Configure here.

@vercel
Copy link

vercel bot commented Mar 19, 2026

@Akhilesh29 is attempting to deploy a commit to the Hatchet Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is kicking off a free cloud agent to fix these issues. This run is complimentary, but you can enable autofix for all future PRs in the Cursor dashboard.

continue
}
return nil, fmt.Errorf("failed to store payloads for step id %s: %w", stepId, err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replay modifies task state before partition error is caught

High Severity

When payloadStore.Store fails with a partition error, the continue skips storing the payload, but r.queries.ReplayTasks on line 2520 has already UPDATEd the v1_task rows in the same transaction (incrementing retry_count, resetting initial_state, etc.). The transaction still commits at line 3414, and the affected tasks remain in the outer replayedTasks list (built at line 3079 before replayTasks is called). This leaves tasks in an inconsistent state — their DB state is modified as "replayed" but no payload exists — and they're signaled to the controller as successfully replayed.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot Autofix determined this is a false positive.

Current replayTasks returns the payload-store error immediately so the enclosing transaction rolls back and no replayed task state is committed without payloads.

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] no partition of relation "v1_payload" found for row

1 participant