You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A deferred-remap compaction (compact_files(defer_index_remap=True)) committed concurrently with an optimize_indices against an older dataset version can leave the table in a corrupt state where any call that walks the indices — including Dataset.list_indices() — panics with:
called `Result::unwrap()` on an `Err` value: InvalidInput {
source: "The compaction plan included a rewrite group that was a split of indexed and non-indexed data: [...]",
location: rust/lance-index/src/frag_reuse.rs:330
}
The panic site is the .unwrap() in Dataset::load_indices (rust/lance/src/index.rs:824 on released builds, :902 on main) when applying the FRI's remap_fragment_bitmap to a user index whose fragment_bitmap straddles a rewrite group's old_frags.
This is a table-corruption bug: once committed, the table can no longer be read via the index APIs. PR #6610 prevents new occurrences by rejecting the conflicting commit, but does not repair tables that were already written in the broken state.
Reproduction
Reliably reproduces on main prior to #6610 with the following sequence:
Write frag0, build a vector index over it.
Append frag1, snapshot a stale Dataset handle.
Append frag2 on the up-to-date handle.
plan_compaction + rewrite_files of [frag1, frag2] with defer_index_remap=true (compute RewriteResult but do not commit).
On the stale handle, run optimize_indices — commits a CreateIndex covering frag1 only (frag2 didn't exist at that version).
index segment B: bitmap = {frag1} (from stale optimize)
FRI group: old=[frag1, frag2] → new=[frag3]
Segment B's bitmap straddles the FRI group, and any load_indices call panics.
A self-contained Rust test that reproduces this on main is in flight — link to draft PR will follow.
Production trigger
Reported on a real table with a rewrite group old=[382843, 382844, 382845, 382846]. ds.list_indices() panics; the table is unreadable through the index path.
Impact
Severity: critical — silent commit corrupts the table; reads via the index path fail afterward.
Detects this corruption non-destructively (reads via read_manifest_indexes, bypassing the FRI auto-remap in load_indices).
Repairs by removing the straddling old fragment IDs from each affected index segment's fragment_bitmap. Previously-indexed rows in the merged new fragment fall through to flat scan until the next optimize_indices — no data loss, no retraining required.
Also:
Have Dataset.validate() surface this as a structured error rather than a panic.
Summary
A deferred-remap compaction (
compact_files(defer_index_remap=True)) committed concurrently with anoptimize_indicesagainst an older dataset version can leave the table in a corrupt state where any call that walks the indices — includingDataset.list_indices()— panics with:The panic site is the
.unwrap()inDataset::load_indices(rust/lance/src/index.rs:824on released builds,:902onmain) when applying the FRI'sremap_fragment_bitmapto a user index whosefragment_bitmapstraddles a rewrite group'sold_frags.This is a table-corruption bug: once committed, the table can no longer be read via the index APIs. PR #6610 prevents new occurrences by rejecting the conflicting commit, but does not repair tables that were already written in the broken state.
Reproduction
Reliably reproduces on
mainprior to #6610 with the following sequence:frag0, build a vector index over it.frag1, snapshot a staleDatasethandle.frag2on the up-to-date handle.plan_compaction+rewrite_filesof[frag1, frag2]withdefer_index_remap=true(computeRewriteResultbut do not commit).optimize_indices— commits aCreateIndexcoveringfrag1only (frag2 didn't exist at that version).commit_compaction. On pre-fix: reject Rewrite vs CreateIndex when FRI groups straddle bitmap #6610 builds this succeeds.Resulting state:
bitmap = {frag0}(original)bitmap = {frag1}(from stale optimize)old=[frag1, frag2] → new=[frag3]Segment B's bitmap straddles the FRI group, and any
load_indicescall panics.A self-contained Rust test that reproduces this on
mainis in flight — link to draft PR will follow.Production trigger
Reported on a real table with a rewrite group
old=[382843, 382844, 382845, 382846].ds.list_indices()panics; the table is unreadable through the index path.Impact
defer_index_remap=trueis enabled andoptimize_indicesruns concurrently with compaction (the production trigger described in fix: reject Rewrite vs CreateIndex when FRI groups straddle bitmap #6610).Proposed fix
Add a
Dataset.repair()API (Rust + Python) that:read_manifest_indexes, bypassing the FRI auto-remap inload_indices).fragment_bitmap. Previously-indexed rows in the merged new fragment fall through to flat scan until the nextoptimize_indices— no data loss, no retraining required.Also:
Dataset.validate()surface this as a structured error rather than a panic.test_data/fixture (built from a pinned pre-fix: reject Rewrite vs CreateIndex when FRI groups straddle bitmap #6610 build) so regressions in detection/repair are caught.