Skip to content

db: don't SeekPrefixGE level iterator past prefix#5786

Closed
RaduBerinde wants to merge 4 commits intocockroachdb:masterfrom
RaduBerinde:merge-iter-seek-prefix-cleanup
Closed

db: don't SeekPrefixGE level iterator past prefix#5786
RaduBerinde wants to merge 4 commits intocockroachdb:masterfrom
RaduBerinde:merge-iter-seek-prefix-cleanup

Conversation

@RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Feb 18, 2026

itertest: fix SeekPrefixGE call in tests

We were passing the full key as the prefix in some tests.

itertest: use "try-seek-using-next" instead of "true"

Passing a bool value is very opaque, use a more explicit string.

db: add merging iter treesteps test with tombstone

Add a test that illustrates the behavior of the iterator when a level
has a tombstone that extends past the prefix.

db: don't SeekPrefixGE level iterator past prefix

Currently the merging iterator can call SeekPrefixGE with a key that
does not have the given prefix. Allowing this is counter-intuitive and
adds subtlety to the lower level iterators, making work on optimizing
the SeekPrefixGE path more difficult. This is also against the
contract documented in InternalIterator.SeekPrefixGE ("the supplied
prefix will be the prefix of the given key returned by that Split
function").

We change the code to avoid this case. When tombstones covers all keys
with the given prefix on a level, SeekPrefixGE no longer seeks that
level (and those below). In order to keep the TrySeekUsingNext
optimization working, we remember which levels have not been
positioned using a flag and we position those without
TrySeekUsingNext when necessary. We are either doing the same seek
at a later time, or we are doing a single seek to a later key.

We were passing the full key as the prefix in some tests.
Passing a bool value is very opaque, use a more explicit string.
@RaduBerinde RaduBerinde requested a review from a team as a code owner February 18, 2026 00:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the merge-iter-seek-prefix-cleanup branch from 788fd7b to e164685 Compare February 18, 2026 04:03
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

Add a test that illustrates the behavior of the iterator when a level
has a tombstone that extends past the prefix.
Currently the merging iterator can call `SeekPrefixGE` with a key that
does not have the given prefix. Allowing this is counter-intuitive and
adds subtlety to the lower level iterators, making work on optimizing
the `SeekPrefixGE` path more difficult. This is also against the
contract documented in InternalIterator.SeekPrefixGE ("the supplied
prefix will be the prefix of the given key returned by that Split
function").

We change the code to avoid this case. When tombstones covers all keys
with the given prefix on a level, `SeekPrefixGE` no longer seeks that
level (and those below). In order to keep the `TrySeekUsingNext`
optimization working, we remember which levels have not been
positioned using a flag and we position those without
`TrySeekUsingNext` when necessary. We are either doing the same seek
at a later time, or we are doing a single seek to a later key.
@RaduBerinde RaduBerinde force-pushed the merge-iter-seek-prefix-cleanup branch from b46df2f to c2e7983 Compare February 18, 2026 04:27
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola partially reviewed 11 files and made 4 comments.
Reviewable status: 9 of 15 files reviewed, 4 unresolved discussions (waiting on RaduBerinde).


-- commits line 18 at r4:

  • This sounds like a revisit of #3845. Is that correct? I'll need to swap in that state and then look at this code, so will need some time.
  • Is there a pressing bug that motivated this, so I can figure out reviewing prioritization?
  • Is it possible to take the first 3 commits, merge them in a separate PR. And get to the bottom of the interleavingIter oddity first. I am more nervous about any new bugs that may have been recently introduced, and less about this seek past the prefix behavior.

sstable/testdata/block_properties line 579 at r2 (raw file):

seek-ge c@19 try-seek-using-next
----
strconv.ParseBool: parsing "try-seek-using-next": invalid syntax

invalid syntax?


sstable/testdata/prefixreader/iter line 266 at r2 (raw file):

seek-prefix-ge c
----
strconv.ParseBool: parsing "try-seek-using-next": invalid syntax

ditto


testdata/treesteps/merging_iter line 124 at r3 (raw file):


# TODO(radu): the InterleavingIter returns `b-b\x00:{(#4,RANGEDEL)}` instead of
# returning a tombstone up to d; this seems suspicious.

Could this be a bug introduced by the recent changes in this area?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaduBerinde made 2 comments.
Reviewable status: 9 of 15 files reviewed, 4 unresolved discussions (waiting on sumeerbhola).


-- commits line 18 at r4:

Previously, sumeerbhola wrote…
  • This sounds like a revisit of #3845. Is that correct? I'll need to swap in that state and then look at this code, so will need some time.
  • Is there a pressing bug that motivated this, so I can figure out reviewing prioritization?
  • Is it possible to take the first 3 commits, merge them in a separate PR. And get to the bottom of the interleavingIter oddity first. I am more nervous about any new bugs that may have been recently introduced, and less about this seek past the prefix behavior.

Correct. No pressing bug, just something I've been wanting to get back to. Yes, I will separate the other commits then investigate the InterleavingIter.


testdata/treesteps/merging_iter line 124 at r3 (raw file):

Previously, sumeerbhola wrote…

Could this be a bug introduced by the recent changes in this area?

I think so.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaduBerinde made 1 comment.
Reviewable status: 9 of 15 files reviewed, 4 unresolved discussions (waiting on sumeerbhola).


testdata/treesteps/merging_iter line 124 at r3 (raw file):

Previously, RaduBerinde wrote…

I think so.

Hm, this block of code has been here for a long time:

https://github.com/cockroachdb/pebble/blame/3758323bef8fed1dacf8968dddca09db1ab8c4ae/internal/keyspan/interleaving_iter.go#L989

I think this simply doesn't work the way we thought it's working - we thought that once a mid-level sees a long tombstone, the lower levels would be seeked beyond that tombstone and they won't be touched again for many seeks-using-next. But it looks like we only seek right past the prefix.

@RaduBerinde
Copy link
Member Author

testdata/treesteps/merging_iter line 124 at r3 (raw file):

Previously, RaduBerinde wrote…

Hm, this block of code has been here for a long time:

https://github.com/cockroachdb/pebble/blame/3758323bef8fed1dacf8968dddca09db1ab8c4ae/internal/keyspan/interleaving_iter.go#L989

I think this simply doesn't work the way we thought it's working - we thought that once a mid-level sees a long tombstone, the lower levels would be seeked beyond that tombstone and they won't be touched again for many seeks-using-next. But it looks like we only seek right past the prefix.

It is required by the Iterator.SeekGEPrefix contract, although it could probably be truncated by the Iterator:

// When iterating with range keys enabled, all range keys encountered are
// truncated to the seek key's prefix's bounds. The truncation of the upper
// bound requires that the database's Comparer is configured with a
// ImmediateSuccessor method. For example, a SeekPrefixGE("a@9") call with the
// prefix "a" will truncate range key bounds to [a,ImmediateSuccessor(a)].
func (i *Iterator) SeekPrefixGE(key []byte) bool {

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola made 1 comment.
Reviewable status: 9 of 15 files reviewed, 4 unresolved discussions (waiting on RaduBerinde).


testdata/treesteps/merging_iter line 124 at r3 (raw file):

Previously, RaduBerinde wrote…

It is required by the Iterator.SeekGEPrefix contract, although it could probably be truncated by the Iterator:

// When iterating with range keys enabled, all range keys encountered are
// truncated to the seek key's prefix's bounds. The truncation of the upper
// bound requires that the database's Comparer is configured with a
// ImmediateSuccessor method. For example, a SeekPrefixGE("a@9") call with the
// prefix "a" will truncate range key bounds to [a,ImmediateSuccessor(a)].
func (i *Iterator) SeekPrefixGE(key []byte) bool {

Prior to Jackson's recent changes didn't we have a separate rangedel iterator to surface the actual rangedels, and the interleaving iter was just being used to surface all the boundaries, to prevent switching to a new sstable too early? And the rangedel iterator would surface the full rangedel?
I am basing the above on my memory of a code walkthrough meeting 2 weeks ago -- I haven't looked at Jackson's recent changes.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaduBerinde made 1 comment.
Reviewable status: 9 of 15 files reviewed, 4 unresolved discussions (waiting on sumeerbhola).


-- commits line 18 at r4:

Previously, RaduBerinde wrote…

Correct. No pressing bug, just something I've been wanting to get back to. Yes, I will separate the other commits then investigate the InterleavingIter.

#5796

I will close this for now.

@RaduBerinde RaduBerinde deleted the merge-iter-seek-prefix-cleanup branch March 5, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants