Skip to content

fix(qwp): prevent JVM crash when closing a QWP sender [DO NOT MERGE]#43

Open
jerrinot wants to merge 2 commits into
mainfrom
jh_segment_manager_segfault
Open

fix(qwp): prevent JVM crash when closing a QWP sender [DO NOT MERGE]#43
jerrinot wants to merge 2 commits into
mainfrom
jh_segment_manager_segfault

Conversation

@jerrinot

@jerrinot jerrinot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Closing a QWP sender while its background segment manager was mid-tick could crash the whole process. The manager's worker thread persists the acknowledged-FSN watermark into a memory-mapped file on each tick; if a sender closed and unmapped that file in the same instant, a stale worker could write to the now-unmapped address and abort the JVM with a SIGSEGV.

The worker now re-checks, under the manager lock, whether the ring is still registered before it touches the watermark or the byte accounting. deregister() flips a lock-guarded registered flag, so once close() returns the worker can no longer write through the unmapped watermark. The watermark write and the totalBytes subtraction are both gated on the flag; drainTrimmable() and the segment close/unlink stay unconditional, so a stale snapshot still unlinks fully-acked segments as before. The O(1) flag replaces the previous O(n) scan of the rings list.

Closing a QWP sender while its background segment manager was mid-tick
could crash the whole process. The manager's worker thread persists the
acknowledged-FSN watermark into a memory-mapped file on each tick; if a
sender closed and unmapped that file in the same instant, a stale worker
could write to the now-unmapped address and abort the JVM with a SIGSEGV.

The worker now re-checks, under the manager lock, whether the ring is
still registered before it touches the watermark or the byte accounting.
deregister() flips a lock-guarded `registered` flag, so once close()
returns the worker can no longer write through the unmapped watermark.
The watermark write and the totalBytes subtraction are both gated on the
flag; drainTrimmable() and the segment close/unlink stay unconditional,
so a stale snapshot still unlinks fully-acked segments as before. The
O(1) flag replaces the previous O(n) scan of the rings list.
@jerrinot jerrinot added the bug Something isn't working label Jun 9, 2026
@jerrinot jerrinot changed the title fix(qwp): prevent JVM crash when closing a QWP sender fix(qwp): prevent JVM crash when closing a QWP sender [DO NOT MERGE] Jun 9, 2026
Keep the bounded close wait, but only free worker-owned native state after
the segment-manager worker is observed dead.

A timed-out or interrupted join can leave the worker alive inside a service
tick. In that state pathScratch may still be used for spare path creation or
native-path cleanup, so closing it immediately risks a native use-after-free.
Leave workerThread set and pathScratch allocated when the worker is still
alive, allowing a later close() to retry cleanup.
@mtopolnik

Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 60 / 63 (95.24%)

file detail

path covered line new line coverage
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java 3 4 75.00%
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java 57 59 96.61%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants