Skip to content

fix(postage): resume live sync where snapshot replay stopped#5517

Merged
gacevicljubisa merged 2 commits into
masterfrom
fix/snapshot-resume-from-replay-block
Jun 25, 2026
Merged

fix(postage): resume live sync where snapshot replay stopped#5517
gacevicljubisa merged 2 commits into
masterfrom
fix/snapshot-resume-from-replay-block

Conversation

@gacevicljubisa

Copy link
Copy Markdown
Member

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Follow-up to #5499. When booting from the postage snapshot, the snapshot replay
stops a few blocks below the snapshot's max block (the listener trims tailSize
and rounds to a batchFactor multiple). But live sync was forced to resume from
the snapshot's nominal tip (maxBlock + 1) via snapshotResumeBlock, skipping
those few blocks. A BatchCreated in that gap later surfaces as
get batch <id>: storage: not found.

Fix: drop the snapshotResumeBlock override. Start already resumes from
cs.Block + 1, and the replay advances cs.Block as it goes, so live sync now
resumes exactly where the replay stopped and re-fetches the trimmed tail from the
chain. No events skipped, none double-processed.

  • pkg/postage/batchservice: remove snapshotResumeBlock and the Start
    override; remove ResumeBlock from the Snapshot struct.
  • pkg/postage/snapshot: New still parses the snapshot eagerly (corruption
    guard), but no longer carries a resume block.
  • Tests: TestSnapshotRebuild asserts live resumes from the replay's last block;
    TestReplayStopsBelowMaxBlock (real listener + filterer) asserts the replay
    stops below maxBlock; restored TestSnapshotLogFilterer_RealSnapshot (dropped
    by the chore: revert #5343 #5482 revert) parses the embedded blob and asserts it is non-empty.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Related to #5495, follow-up to #5499. Restores a real-archive test removed by the
#5343 revert (#5482).

Screenshots (if appropriate):

AI Disclosure

  • This PR contains code that has been generated by an LLM.
  • I have reviewed the AI generated code thoroughly.
  • I possess the technical expertise to responsibly review the code generated in this PR.

@gacevicljubisa

gacevicljubisa commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Here is the failing test @acud and @martinconic
https://github.com/ethersphere/bee/actions/runs/28029337125/job/82965450122?pr=5518#step:8:95

On current branch, you can find this test in pkg/postage/batchservice/batchservice_test.go with the name TestSnapshotHandoffNoGap, where it passes.

@gacevicljubisa gacevicljubisa marked this pull request as ready for review June 23, 2026 13:54

@acud acud left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. nice catch

@gacevicljubisa gacevicljubisa merged commit 025d9d4 into master Jun 25, 2026
19 of 21 checks passed
@gacevicljubisa gacevicljubisa deleted the fix/snapshot-resume-from-replay-block branch June 25, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants