Pq replay unacked by yaauie · Pull Request #18766 · elastic/logstash

yaauie · 2026-02-11T20:14:45Z

Release notes

[rn:skip]

What does this PR do?

Adds Batch#unread, which is a way for a worker to refute ownership of a batch so that its events can be picked up by another worker. This is a pre-requisite for in-place crashed pipeline recovery (#18534)

The PQ Page has long kept track of which events on the page had been acked using a BitSet, but it only kept track of the "high water mark" for the first unread, based on the assumption that once an event has been read from a page the only outcomes were the events being acknowledged XOR the pipeline crashing.

If we want for events to be emitted again without re-opening the queue, we need to port the same BitSet logic to reads, marking events as they are read and unmarking them when they are unread.

There are many assumptions across the PQ's implementation that assume a batch will always contain a contiguous sequence of events, and the ability to un-read events means that two consecutive calls to Page#read are no longer guaranteed to emit a single contiguous sequence (e.g., if an un-read has occurred between calls to Page#read, the second set of events can include events from before the first). We refactor the Queue#readPageBatch to ensure only one call to Page#read is made.

Review Note:

This PR is best reviewed one commit at a time. It contains a number of small zero-net-change refactors to clean up the existing code before meaningfully changing the behavior.

Why is it important/What is the impact to the user?

This is part of the work needed to make a pipeline recoverable in the event of a worker crash without closing the queue (and therefore without needing to close the pipeline's inputs). It allows a crashing worker to return its batch of events to the queue so that when a new generation of workers is started, the events from that batch can be picked up and reprocessed.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
I have added tests that prove my fix is effective or that my feature works

github-actions · 2026-02-11T20:14:55Z

🤖 GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

mergify · 2026-02-11T20:15:21Z

This pull request does not have a backport label. Could you fix it @yaauie? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

jsvd · 2026-02-13T09:50:47Z

logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

+                }
+                notEmpty.signalAll();
+            }
+            this.unreadCount -= unreadCount;


Suggested change

this.unreadCount -= unreadCount;

this.unreadCount += unreadCount;

I believe this should be +=, otherwise unreadCount will be incorrect:

> test_dir = "/tmp/logstash-pq-test-#{Time.now.to_i}" => "/tmp/logstash-pq-test-1770976075" > Dir.mkdir(test_dir) => 0 > settings = Java::org.logstash.ackedqueue.SettingsImpl.fileSettingsBuilder(test_dir).elementClass(Java::org.logstash.Event.java_class).capacity(10000).build() => #<Java::OrgLogstashAckedqueue::SettingsImpl:0x56357e52> > queue = Java::org.logstash.ackedqueue.Queue.new(settings); queue.open [2026-02-13T09:48:18,226][INFO ][org.logstash.ackedqueue.QueueUpgrade] No PQ version file found, upgrading to PQ v2. => nil > 100.times { |i| queue.write(Java::org.logstash.Event.new()) }; queue.unreadCount => 100 > batch = queue.readBatch(10, 100); batch.unread => nil > queue.getUnreadCount => 80 # should be 100 ⚠️

I don't think it would cause data loss but it could exert backpressure unnecessarily.

jsvd · 2026-02-13T10:06:22Z

logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

        }
    }

+    public void unread(final long firstUnreadSeqNum, final int unreadCount) throws IOException {


oh english is fun..let's talk about unread and unread...

the way I understand this is:

the word "unread" (un-reed) in the method name would be an infinitive form denoting the action of unreading

the second argument int unreadCount would be (un-reedCount) in the sense that we're asking to unread this amount of events

finally, in this.unreadCount, the "unread" is an adjective and thus read un-red (what a tongue-twister).

So the words used for unreadCount (the argument) and unreadCount (the this. variable) are homographs, and the former ends up "soft-shadowing" the latter...

All this to say.. can we rename the 2nd argument to just eventCount, and reflect the change to the rest of the method? 😅

Suggested change

public void unread(final long firstUnreadSeqNum, final int unreadCount) throws IOException {

public void unread(final long firstUnreadSeqNum, final int eventCount) throws IOException {

jsvd · 2026-02-13T10:12:22Z

logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

        }
    }

+    public void unread(final long firstUnreadSeqNum, final int unreadCount) throws IOException {


should this be idempontent?

> 100.times { |i| queue.write(Java::org.logstash.Event.new()) }; queue.unreadCount => 100 > batch = queue.readBatch(10, 100); 10.times { batch.unread } => 10 > queue.getUnreadCount => -10

I don't think we want unread to be called more than once, but I wonder if there's any harm in this extra safety net.

elasticmachine · 2026-03-04T04:45:19Z

💚 Build Succeeded

Buildkite Build
Commit: 95ff84f

History

💛 Build #4347 was flaky 657fcfe
💔 Build #4338 failed 8a4ed50

yaauie added the enhancement label Feb 11, 2026

yaauie added 5 commits February 12, 2026 17:30

pq: encapsulate Page by weakening visibility of internal details

b50f745

pq: encapsulate Page.ackedSeqNums field

bd22097

pq: dry sum-across-all-pages calculations

61324bc

pq: simplify read to ensure continuous batches

d5cc8d6

pq: add support for Batch#unread

657fcfe

yaauie force-pushed the pq-replay-unacked branch from 8a4ed50 to 657fcfe Compare February 12, 2026 17:31

jsvd requested changes Feb 13, 2026

View reviewed changes

yaauie added 2 commits March 3, 2026 18:05

cleanup: static field CAPITAL_SNAKE

cc5575a

pq-unread: make idempotent, fix the math, add test cov

95ff84f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pq replay unacked#18766

Pq replay unacked#18766
yaauie wants to merge 7 commits intoelastic:mainfrom
yaauie:pq-replay-unacked

yaauie commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

mergify bot commented Feb 11, 2026

Uh oh!

jsvd Feb 13, 2026

Uh oh!

jsvd Feb 13, 2026 •

edited

Loading

Uh oh!

jsvd Feb 13, 2026

Uh oh!

elasticmachine commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	this.unreadCount -= unreadCount;
	this.unreadCount += unreadCount;

	public void unread(final long firstUnreadSeqNum, final int unreadCount) throws IOException {
	public void unread(final long firstUnreadSeqNum, final int eventCount) throws IOException {

Conversation

yaauie commented Feb 11, 2026

Release notes

What does this PR do?

Review Note:

Why is it important/What is the impact to the user?

Checklist

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 GitHub comments

Uh oh!

mergify bot commented Feb 11, 2026

Uh oh!

jsvd Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

jsvd Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsvd Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Mar 4, 2026

💚 Build Succeeded

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jsvd Feb 13, 2026 •

edited

Loading