Skip to content

[tantivy] Reuse TantivySearcher across queries via searcher pool#7671

Open
chenghuichen wants to merge 7 commits intoapache:masterfrom
chenghuichen:tantivy-fix2
Open

[tantivy] Reuse TantivySearcher across queries via searcher pool#7671
chenghuichen wants to merge 7 commits intoapache:masterfrom
chenghuichen:tantivy-fix2

Conversation

@chenghuichen
Copy link
Copy Markdown
Contributor

@chenghuichen chenghuichen commented Apr 19, 2026

Purpose

Each full-text search query currently opens a fresh TantivySearcher, which rebuilds the Rust-side index structures (including loading the .term FST dictionary) from scratch. On object storage (S3/OSS), this means a full GET of the index file on every query. In Flink streaming pipelines — the primary JVM consumer of Paimon's global index — the same index shard is queried continuously within a single subtask lifetime, making repeated loading pure waste.

This PR introduces a TantivySearcherPool that keeps TantivySearcher instances alive across queries, borrowing on query start and returning on close rather than destroying and rebuilding.

Benefit Assessment

Benchmark on local disk (500k docs, 17MB index, 500 queries, JIT-warmed):

No-pool:   avg=2.86 ms  (open=1.40 ms / 49%,  search=1.27 ms)
With pool: avg=0.79 ms  (search only)
Speedup:   3.62x

On object storage the gap widens further: the open phase includes a full GET of the .term file (FST dictionary, typically several MB per shard). With the pool, .term stays resident in Rust memory across queries, eliminating both the latency and the object storage data transfer cost of repeated loading. For tables under heavy compaction, index files are replaced by new paths; stale pool entries go unused without affecting correctness.

Tests

  • TantivyFullTextGlobalIndexTest.java

@chenghuichen chenghuichen changed the title [tantivy] Reuse TantivySearcher across queries via searcher pool [WIP] [tantivy] Reuse TantivySearcher across queries via searcher pool Apr 19, 2026
@chenghuichen chenghuichen changed the title [WIP] [tantivy] Reuse TantivySearcher across queries via searcher pool [tantivy] Reuse TantivySearcher across queries via searcher pool Apr 19, 2026
@chenghuichen
Copy link
Copy Markdown
Contributor Author

Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant