serialize lance optimize to fix reprocess commit-conflict race by HumanBean17 · Pull Request #309 · HumanBean17/java-codebase-rag

HumanBean17 · 2026-06-13T15:09:31Z

Scope

Fixes #308. java-codebase-rag reprocess floods stderr with:

ERROR cocoindex.connectors.lancedb._target: Exception in optimizing LanceDB table javacodeindex_java_code
RuntimeError: lance error: Retryable commit conflict for version 4424: This Rewrite transaction was preempted by concurrent transaction Delete at version 4424. Please retry.

Root cause

cocoindex 1.0.7 schedules table.optimize() (a LanceDB Rewrite/compaction transaction) as a background asyncio task, concurrently with mutation batches that issue table.delete() (Delete transactions). LanceDB does not allow a Rewrite to commit concurrently with a Delete (upstream lancedb#1504 — "We do not support concurrent deletes right now. I'd recommend serializing…"). cocoindex's _run_optimize logs and never retries on this conflict, so the table is left un-optimized/fragmented and stderr floods. lancedb.AsyncTable.optimize() has no retry parameter.

The fix (3 parts)

Disable cocoindex's concurrent background optimize at the source — java_index_flow_lancedb.py: add _NUM_TXN_BEFORE_OPTIMIZE = 10**12 (with a comment citing the race + cocoindex 1.0.7) and pass num_transactions_before_optimize=_NUM_TXN_BEFORE_OPTIMIZE to all three lancedb.mount_table_target(...) calls. This stops any background optimize() from running during the flow, so the Rewrite-vs-Delete race cannot occur. Safe: optimize() is purely maintenance (compact/prune/index); upsert/delete correctness via merge_insert does not depend on it.
New serialized optimize helper with retry guard — java_codebase_rag/lance_optimize.py:
- LANCE_TABLE_NAMES constant (the three tables) — single source of truth, imported by the flow instead of the inline literals.
- async def optimize_lance_tables(index_dir, *, quiet=False) -> dict[str, str]: lazy import lancedb (the flow imports this module for the constant and must not pay the lancedb import cost); connect_async → list_tables → per-table open_table + optimize(). Retry loop (6 attempts, exponential backoff 0.1 * 2**attempt) on errors whose str(exc) contains "Retryable commit conflict" OR "preempted by concurrent transaction"; non-conflict errors are not retried. Missing tables (e.g. a repo with no SQL/YAML) are reported skipped. db.close() runs in finally (it is a sync method in lancedb 0.30.x). All diagnostics go to stderr (this is callable from the stdio MCP / JSON-stdout paths); per-table status returned as a dict, errors captured as "error: <text>".
Wire the post-optimize into both cocoindex chokepoints — run optimize only after cocoindex returns exit 0 (no concurrent writers → clean optimize):
- pipeline.run_cocoindex_update (java_codebase_rag/pipeline.py, used by init / increment / reprocess --vectors-only): after the subprocess completes with code == 0, asyncio.run(optimize_lance_tables(...)). Index dir resolved from the passed env (JAVA_CODEBASE_RAG_INDEX_DIR, set by config.subprocess_env / apply_to_os_environ — the same key the flow's lifespan reads). If absent, skip with a stderr warning (do not crash). The CompletedProcess return is unchanged on optimize failure; outcome logged to stderr.
- server.run_refresh_pipeline (server.py, default reprocess): in the if ok: branch, before the graph-build step, await optimize_lance_tables(<resolved index_dir>, quiet=quiet). Index dir resolved the same way the server does (env var → <root>/.java-codebase-rag). New optional field optimize_error: str | None on RefreshIndexOutput; an optimize failure is surfaced via that field + message + stderr, but never flips success/exit semantics for a vectors phase that succeeded.

Manual / test evidence

$ .venv/bin/ruff check .
All checks passed!

$ .venv/bin/python -m pytest tests -q
746 passed, 11 skipped, 18 warnings in 480.72s

(The 11 skips are the heavy e2e tests gated behind JAVA_CODEBASE_RAG_RUN_HEAVY=1, per tests/README.md.)

New tests in tests/test_lance_optimize.py (fakes the lancedb async conn/table — no real LanceDB needed):

test_optimize_retries_commit_conflict_then_succeeds — 2 conflicts then ok → asserts 3 calls, status ok.
test_optimize_does_not_retry_non_conflict_error — a ValueError is captured per-table, not retried (1 call).
test_optimize_reports_missing_table_as_skipped — absent tables come back skipped, no exception.
test_optimize_closes_connection_even_on_open_failure — db.close() runs in finally.
test_lance_table_names_constant_matches_search_lancedb_tables — single source of truth agrees with search_lancedb.TABLES.

tests/test_lancedb_e2e.py (heavy, runs --full-reprocess): added an assertion that the cocoindex flow stderr contains no "Retryable commit conflict" / "preempted by concurrent transaction" markers after the fix — this is the fixture where a race regression would surface.

Updated tests/fixtures/cli_progress_stdout/reprocess_quiet_success.stdout.txt baseline for the new additive optimize_error field (sorted-key JSON, as serialized by the CLI).

Notes

No schema/ontology bump; no re-index-required callout (this is not a schema change — optimize() is pure maintenance).
No deviation from the plan. AsyncConnection.close() is a sync method in lancedb 0.30.x (verified in .venv/.../lancedb/db.py), so the helper calls db.close() directly rather than await db.close(). The helper uses list_tables() (the non-deprecated API, matching cocoindex's own _list_table_names helper) with a table_names() fallback.

🤖 Generated with Claude Code

cocoindex 1.0.7 schedules table.optimize() (a LanceDB Rewrite transaction) as a background asyncio task that races concurrent table.delete() (Delete) transactions, which LanceDB rejects (upstream lancedb#1504), flooding reprocess stderr with "Retryable commit conflict ... preempted by concurrent transaction Delete" and leaving tables un-optimized. - Disable the in-flow background optimize by setting num_transactions_before_optimize=10**12 on all three mount_table_target calls in java_index_flow_lancedb.py (optimize is pure maintenance; upsert/ delete correctness via merge_insert does not depend on it). - Add java_codebase_rag/lance_optimize.py with a serialized optimize_lance_tables() helper that runs table.optimize() once per table after the flow returns (no concurrent writers), with retry + exponential backoff on the residual commit-conflict. LANCE_TABLE_NAMES becomes the single source of truth, imported by the flow. - Wire the post-flow optimize into both cocoindex chokepoints: pipeline.run_cocoindex_update (used by init/increment/reprocess --vectors-only) and server.run_refresh_pipeline (default reprocess). An optimize failure is surfaced via stderr / the new RefreshIndexOutput optimize_error field / message; success is never flipped. No schema/ontology bump; no re-index-required callout. Co-Authored-By: Claude <noreply@anthropic.com>

HumanBean17 force-pushed the fix/lance-optimize-race branch from dcaddb7 to 5c51baa Compare June 13, 2026 15:18

HumanBean17 merged commit c26b94f into master Jun 13, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serialize lance optimize to fix reprocess commit-conflict race#309

serialize lance optimize to fix reprocess commit-conflict race#309
HumanBean17 merged 1 commit into
masterfrom
fix/lance-optimize-race

HumanBean17 commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HumanBean17 commented Jun 13, 2026

Scope

Root cause

The fix (3 parts)

Manual / test evidence

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant