Skip to content

Introduce GigaMap sub-queries across all index types#637

Open
fh-ms wants to merge 18 commits intomainfrom
gigamap-subquery
Open

Introduce GigaMap sub-queries across all index types#637
fh-ms wants to merge 18 commits intomainfrom
gigamap-subquery

Conversation

@fh-ms
Copy link
Copy Markdown
Contributor

@fh-ms fh-ms commented Apr 15, 2026

Summary

  • Add GigaMap.SubQuery / EntityIdMatcher so bitmap, Lucene full-text and JVector similarity queries can be combined in a single GigaQuery via query.and(subQuery). Ordered matchers report the next candidate id, letting the bitmap executor gap-skip through AND compositions.
  • Introduce a shared ScoredSearchResult<E> supertype in the core types module. VectorSearchResult and the new LuceneSearchResult extend it, acting as sub-queries on the right-hand side of a GigaQuery and offering their own ScoredSearchResult.and(SubQuery) for score-preserving narrowing from the scored side.
  • Extract EntityResolver, BitmapIterator, BitmapIteration, BitmapEntityIdMatcher and AbstractBitmapIterating into standalone top-level types; remove the obsolete GigaIteration / AbstractGigaIterating. Add GigaIterator.nextIndexed / GigaQuery.iterateIndexed for (id, entity) iteration.
  • Update the docs under docs/modules/gigamap/ to cover sub-query composition, indexed iteration, and the refreshed JVector / Lucene examples.

Usage

// From the bitmap side — scores dropped
List<Article> published = gigaMap.query(status.is("PUBLISHED"))
    .and(luceneIndex.search("content:eclipse", 100))
    .toList();

// From the scored side — keep relevance scores on the result
ScoredSearchResult<Article> ranked = luceneIndex.search("content:eclipse", 100)
    .and(gigaMap.query(status.is("PUBLISHED")));

// Vector similarity composed with a bitmap filter
ScoredSearchResult<Doc> topTech = vectorIndex.search(queryVector, 50)
    .and(gigaMap.query(category.is("tech")));

// (id, entity) iteration on any query
gigaMap.query(name.is("John")).iterateIndexed((id, person) -> ...);

fh-ms added 7 commits April 15, 2026 12:24
- Add GigaMap.SubQuery and EntityIdMatcher so queries can be combined by entity id, including ordered matchers that report the next candidate id to let the executor skip over gaps.
- Expose bitmap query results as an EntityIdMatcher via the new BitmapEntityIdMatcher, so a query can participate as a sub-query in another query.
- Extract EntityResolver, BitmapIterator, BitmapIteration and AbstractBitmapIterating into standalone top-level types; remove the now-obsolete GigaIteration / AbstractGigaIterating.
- Add GigaIterator#nextIndexed and GigaQuery#iterateIndexed for (id, entity) iteration; wire it through BitmapIterator.
- Document the newly exposed types (EntityIdMatcher, EntityResolver, BitmapIterator, BitmapIteration, BitmapEntityIdMatcher, GigaMap.SubQuery and the undocumented GigaQuery methods).
- Add GigaMap.SubQuery and EntityIdMatcher so queries can be composed by entity id. Ordered matchers report the next candidate id, letting the bitmap executor gap-skip through AND compositions.
- Expose bitmap query results as an EntityIdMatcher via the new BitmapEntityIdMatcher, so any GigaQuery can act as a sub-query of another.
- Add a common ScoredSearchResult supertype in org.eclipse.store.gigamap.types with a shared Entry (lazy entity lookup) and Default (XGettingList-backed, caches the id matcher).
- Make VectorSearchResult and the new LuceneSearchResult extend ScoredSearchResult, so vector similarity and Lucene full-text searches compose with bitmap queries via query.and(hits).
- Add LuceneIndex.search(...) returning a LuceneSearchResult (ids, scores, lazily-resolved entities) alongside the existing acceptor-based query(...) API.
- Extract EntityResolver, BitmapIterator, BitmapIteration and AbstractBitmapIterating into standalone top-level types; remove the obsolete GigaIteration / AbstractGigaIterating.
- Add GigaIterator#nextIndexed and GigaQuery#iterateIndexed for (id, entity) iteration; wire it through BitmapIterator.
- Cover the new cross-index composition with VectorSearchSubQueryTest and LuceneSearchSubQueryTest.
- Add ScoredSearchResult.and(GigaMap.SubQuery) as the scored-side counterpart of GigaQuery.and(SubQuery). The result preserves scores and the original score-descending iteration order, so the caller can keep iterating scored entries after the narrowing: var top = vectorIndex.search(v, 50).and(gigaMap.query(status.is("PUB")));
- Cache fix in ScoredSearchResult.Default: store the sorted long[] instead of the matcher instance, and wrap a fresh AscendingListWrapper on every provideEntityIdMatcher() call. Prevents the matcher's mutable cursor from being shared across independent consumers (previously a latent bug when reusing a search result in two different queries).
- Cover both behaviors in VectorSearchSubQueryTest and LuceneSearchSubQueryTest: intersection correctness, score-order preservation, chainability of the narrowed result, reuse of the same hits across independent .and(...) calls, and empty intersections.
…cursor advancements in specific implementations.
- Explain how to combine GigaMap queries with other sub-query types, including bitmap, vector, and Lucene results.
- Detail usage of `iterateIndexed` for processing results with both entity IDs and entities.
- Update examples across modules to reflect changes, such as preserving scores in `ScoredSearchResult` during query narrowing.
@fh-ms fh-ms added enhancement New feature or request GigaMap labels Apr 15, 2026
@fh-ms fh-ms requested a review from Copilot April 15, 2026 12:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a unified sub-query mechanism (GigaMap.SubQuery / EntityIdMatcher) to allow intersecting bitmap-backed GigaQuery results with external scored results (Lucene / JVector), and adds a shared ScoredSearchResult<E> abstraction to support score-preserving narrowing from the scored side.

Changes:

  • Add ScoredSearchResult<E> as a common, composable scored-result type; make Lucene/JVector results implement it.
  • Add GigaMap.SubQuery + EntityIdMatcher and wire GigaQuery.and(SubQuery) composition through bitmap execution/iteration.
  • Refactor bitmap iteration/resolution internals into new top-level types and update docs/tests for sub-query composition and indexed iteration.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
gigamap/lucene/src/test/java/org/eclipse/store/gigamap/lucene/LuceneSearchSubQueryTest.java Adds tests for Lucene result composition as a SubQuery and score-preserving narrowing.
gigamap/lucene/src/main/java/org/eclipse/store/gigamap/lucene/LuceneSearchResult.java Introduces Lucene scored-result interface extending ScoredSearchResult.
gigamap/lucene/src/main/java/org/eclipse/store/gigamap/lucene/LuceneIndex.java Adds search(...) APIs returning LuceneSearchResult and builds scored entries.
gigamap/jvector/src/test/java/org/eclipse/store/gigamap/jvector/VectorSearchSubQueryTest.java Adds tests for vector result composition as a SubQuery and score-preserving narrowing.
gigamap/jvector/src/main/java/org/eclipse/store/gigamap/jvector/VectorSearchResult.java Refactors vector result type to extend ScoredSearchResult.
gigamap/jvector/src/main/java/org/eclipse/store/gigamap/jvector/VectorIndex.java Updates conversion to produce ScoredSearchResult.Entry entries.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/ThreadedIterator.java Renames reader-close call site (closeIteratorcloseReader).
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/ScoredSearchResult.java Adds shared scored result abstraction with and(SubQuery) narrowing and id-matcher materialization.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/ResultIdIterator.java Renames reader-close call site (closeIteratorcloseReader).
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaQuery.java Adds SubQuery support, removes Predicate, adds test, and wires id-matchers into execution/iteration.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaMap.java Adds SubQuery, introduces createEntityIdMatcher, threads matcher into bitmap execution, renames active iterator tracking to readers.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaIterator.java Removes old iterator implementation; updates wrapper to use EntityResolver and closeReader.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaIteration.java Removes obsolete iteration type in favor of new bitmap iteration types.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/EntityResolver.java Extracts resolver abstraction from BitmapResult for id→entity resolution.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/EntityIdMatcher.java Adds matcher abstraction for composing arbitrary id sources (including ordered gap-skipping).
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/ContainsBreaker.java Updates contains breaker to implement EntityResolver.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/BitmapResult.java Replaces nested Resolver with EntityResolver usage.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/BitmapIterator.java Adds new bitmap iterator implementation driven by AbstractBitmapIterating and EntityIdMatcher.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/BitmapIteration.java Adds one-shot bitmap traversal executor used by executeReadOnly.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/BitmapEntityIdMatcher.java Exposes bitmap query results as an ordered EntityIdMatcher (for query-as-subquery).
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/AbstractCompositeBitmapIndex.java Updates contains path to call new execute(...) signature with EntityIdMatcher.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/AbstractBitmapIterating.java Refactors core bitmap traversal logic; adds matcher-aware gap skipping.
gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/AbstractBitmapIndexBinary.java Updates contains path to call new execute(...) signature with EntityIdMatcher.
docs/modules/gigamap/pages/queries/executing.adoc Documents indexed iteration (iterateIndexed / nextIndexed).
docs/modules/gigamap/pages/queries/defining.adoc Documents SubQuery composition patterns and fixed-id-set matching.
docs/modules/gigamap/pages/indexing/lucene/use-cases.adoc Updates Lucene hybrid examples to use composable LuceneSearchResult instead of manual filtering.
docs/modules/gigamap/pages/indexing/lucene/index.adoc Documents Lucene composable search(...) results and narrowing from scored side.
docs/modules/gigamap/pages/indexing/jvector/use-cases.adoc Updates entry type references to ScoredSearchResult.Entry.
docs/modules/gigamap/pages/indexing/jvector/index.adoc Updates vector examples to iterate ScoredSearchResult.Entry.
docs/modules/gigamap/pages/indexing/jvector/advanced.adoc Updates hybrid search examples to use SubQuery intersection instead of manual id filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaMap.java Outdated
Comment thread gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaMap.java Outdated
Comment thread gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/GigaQuery.java Outdated
fh-ms added 4 commits April 15, 2026 15:33
- Short-circuit single sub-query cases to avoid unnecessary wrapping.
- Refactor `buildEntityIdMatcher` for improved clarity and efficiency.
- Update `iterator` method to include new id range parameters (`idStart`, `idBound`).
…ty IDs

- Add `idStart` and `idBound` parameters to iterator creation for range-limited processing.
- Introduce `materializeEntityIds` for thread-safe, stateless entity ID matching.
- Adjust multi-threaded execution logic to conditionally enable threading based on `idMatcher` usage.
- Refactor sub-query handling for optimized partitioning and thread safety.
- Introduce regression tests for sub-query composition including AND semantics, range constraints, and multi-threaded execution.
- Ensure proper lock handling, short-circuiting logic, and id range honoring during query execution.
- Validate compatibility with indexed iteration and multi-consumer execution scenarios.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/AbstractBitmapIterating.java:54

  • This field comment is stale: AbstractBitmapIterating no longer has a resolver/parent reference, but the comment still mentions a resolver not referencing the parent. Please update or remove it to avoid confusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gigamap/gigamap/src/main/java/org/eclipse/store/gigamap/types/EntityResolver.java Outdated
fh-ms added 3 commits April 15, 2026 15:59
…roughout the codebase

- Updated all code references, including business logic, tests, and examples, to utilize the new `ScoredSearchResult.Entry` type.
- Improved consistency and readability by ensuring the correct type is used for scored search results across modules.
- Updated imports and ensured all tests pass with the new implementation.
- Corrected `disfunctional` to `dysfunctional` and `seperately` to `separately` to improve code comment clarity.
@fh-ms fh-ms requested a review from zdenek-jonas April 15, 2026 14:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Allow `null` conditions to match all entities in iterator and matcher logic.
- Update `test` method to handle cases where condition is `null`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request GigaMap

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants