Skip to content

fix(qwp): fixes a rare store-and-forward sender race that could silently drop a batch#41

Merged
jerrinot merged 3 commits into
mainfrom
jh_send_one_recheck
Jun 11, 2026
Merged

fix(qwp): fixes a rare store-and-forward sender race that could silently drop a batch#41
jerrinot merged 3 commits into
mainfrom
jh_send_one_recheck

Conversation

@jerrinot

@jerrinot jerrinot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Fixes a rare store-and-forward sender race that could silently drop a batch during high-throughput ingestion when a segment rotated while the sender was catching up.

implementation details for reviewers
Avoid skipping a frame published after trySendOne observes the old segment tip but before it observes that the producer rotated to a new active segment. Re-read the current segment's published offset before advancing so the I/O loop sends any newly published tail frame instead of moving to the next segment.

Avoid skipping a frame published after trySendOne observes the old segment
tip but before it observes that the producer rotated to a new active segment.
Re-read the current segment's published offset before advancing so the I/O
loop sends any newly published tail frame instead of moving to the next segment.
@jerrinot jerrinot added the bug Something isn't working label Jun 9, 2026
@mtopolnik

Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 7 / 7 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java 7 7 100.00%

@jerrinot

Copy link
Copy Markdown
Contributor Author

Review: PR #41fix(qwp): fixes a rare store-and-forward sender race that could silently drop a batch

Level 3 (full mission-critical pass). This is a tightly-scoped 7-line concurrency fix in a single hot-path method plus a probabilistic regression test, so I performed the structured agent analysis (correctness, concurrency, resource, test, cross-context, adversarial) inline and verified every finding against source — including an empirical bisect.

What the change does

trySendOne() previously read the segment tip once (pub = publishedOffset()), and if it looked drained and the segment was no longer the active one, it advanced to the next segment. The producer publishes the segment's final frame and then rotates the active in separate, ordered steps, so the consumer could observe the rotation (sendingSegment != activeSegment()) without having observed the final frame — and advance past it, dropping it permanently. The fix re-reads publishedOffset() after confirming the segment is sealed; if the tail frame raced in, it falls through and sends it instead of advancing.

Correctness & concurrency — verified

The fix is correct, and the memory ordering holds under the JMM:

  • Producer (single thread, program order): tryAppend() publishes the tail frame via MmapSegment.publishedCursor (volatile, written lastMmapSegment.java:442), in an earlier appendOrFsn call than the rotation active = spare (volatile, SegmentRing.java:383). Rotation only happens when the old active is full, so the tail publish strictly precedes it.
  • Consumer (I/O thread): the re-read publishedOffset() (CursorWebSocketSendLoop.java:1275) is sequenced after the activeSegment() read (:1270). When that read observes the rotation, the active volatile establishes a synchronizes-with edge, so everything before active = spare — including the tail publish — happens-before the re-read. The re-read therefore must see the tail frame. Sound.

I also confirmed the control-flow rewrite drops no behavior:

  • The new else { return false; } exactly preserves the old "on active, nothing to send → wait" path. The fix correctly re-reads only in the sealed case — on the active segment a missed frame is picked up on the next iteration (the segment isn't going anywhere), so no re-read is needed there. The asymmetry is intentional and right.
  • All paths reaching the frame-decode block (:1288+) carry a pub that is either fresh-from-top (hot path, sendOffset < pub) or the fresh re-read — never a stale value that could drive an out-of-bounds read.
  • No double-send: after sending the raced-in tail, sendOffset advances and the next iteration advances the segment normally.

Trim-safety is unchanged: a segment holding an unsent frame cannot be trimmed (its frames aren't acked under a well-behaved server), so the re-read never touches freed memory.

Empirical verification

I proved the regression test is valid by bisecting:

  • On the fix: 17 runs, 0 failures.
  • On the reverted (buggy) code, properly recompiled: 2 failures in 12 runs (~17%)rotation race reordered/dropped a frame — fsn 37912 on the wire was 37952 (a dropped batch) and a single-frame drop at fsn 42431. Matches the test's "reproduces a fair fraction of runs" claim and the PR's "drop a batch" wording.
  • Regression suite: the full cursor send-loop + ring + segment suite (97 tests) is green with the fix.

(One note: an early "passing" run on reverted code was a false negative because surefire:test doesn't recompile — once recompiled, the test reproduces the bug. The test itself is sound.)

Test review

Strong, faithful regression test. Uses assertMemoryLeak(); drives the real I/O thread against memory-mode segments (identical trySendOne() path to disk mode, so no disk-mode gap); payload-as-fsn makes drops show up as a contiguity gap; resource teardown order is correct (loop.close() joins the I/O thread before engine.close() frees the segments — verified loop.close() also closes the RecordingClient, so no native leak). The 30s drain cap is only hit on a failing run, which is the intended design.

Findings

Critical: none.

Moderate: none.

Minor (non-blocking):

  1. Title moodfix(qwp): fixes a rare … uses third-person "fixes"; Conventional Commits prefers imperative "fix a rare …". Also ~80 chars, over the ~72 guideline. Cosmetic; the squash title is otherwise fine.
  2. RecordingClient member orderingrecorded / sentCount / recordedCount aren't grouped-by-visibility / alphabetized per the QuestDB style. Test-only, trivial.

Downgraded (considered, dismissed)

  • "Re-read could observe a stale tip and read freed bytes after trim" — dismissed: a segment with an unsent frame can't be acked→trimmed under a correct server, and reading the publishedCursor field is heap-safe regardless. The Byzantine-server angle (ack clamped at publishedFsn, not at "sent") is pre-existing defense-in-depth, not introduced here.
  • "advanceSegment rescan is O(sealed²)" — dismissed: pre-existing, unchanged by this PR, and bounded in production by ack-driven trim.

Summary

Approve. Correct, minimally-scoped concurrency fix with a regression test I empirically confirmed fails on the bug and passes on the fix; no behavioral regressions across 97 tests. 0 draft findings dropped as false positives during the in-diff pass; 2 minor nits, both cosmetic. In-diff: 2 minor / out-of-diff: 0 — appropriate here, since the only consumer of the changed private method is the I/O loop, which the new test exercises end-to-end.

@jovfer jovfer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I can't say there are no other similar issues, but this particular change makes perfect sense.

@jerrinot jerrinot merged commit ac1d4ba into main Jun 11, 2026
12 checks passed
@jerrinot jerrinot deleted the jh_send_one_recheck branch June 11, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants