Skip to content

DiskBBQ tail centroids should always be block encoded too#139835

Merged
tteofili merged 31 commits intoelastic:mainfrom
tteofili:dbbq_bes
Jan 8, 2026
Merged

DiskBBQ tail centroids should always be block encoded too#139835
tteofili merged 31 commits intoelastic:mainfrom
tteofili:dbbq_bes

Conversation

@tteofili
Copy link
Contributor

@tteofili tteofili commented Dec 19, 2025

DiskBBQ should always block encode (and bulk score) centroids, even for tails smaller than BULK_SIZE (16).

see #138296

@tteofili
Copy link
Contributor Author

this provides some speedups at lower visit percentage, while retaining the same recall on hotpotQA E5 small.

baseline

index_name                       index_type  num_docs  doc_add_time(ms)  total_index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------  ----------  --------  ----------------  --------------------  --------------------  ------------  
corpus-hotpotqa-E5-small-0.fvec         ivf   5000000             24590                108283                     0             5

index_name                       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall     visited  filter_selectivity  filter_cached  oversampling_factor
-------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  ----------  ------------------  -------------  -------------------  
corpus-hotpotqa-E5-small-0.fvec         ivf                1.000         1.58              0.00           0.00  632.91    0.70   101969.61                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf                5.000         6.40              0.00           0.00  156.31    0.80   501993.56                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               10.000        12.51              0.00           0.00   79.92    0.82  1001983.24                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               30.000        35.32              0.00           0.00   28.31    0.84  3001917.90                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               50.000        59.67              0.00           0.00   16.76    0.85  5001867.57                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               70.000        81.80              0.00           0.00   12.22    0.85  7001670.77                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf              100.000       117.33              0.00           0.00    8.52    0.85  9999825.00                1.00           true                 3.00

candidate

index_name                       index_type  num_docs  doc_add_time(ms)  total_index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------  ----------  --------  ----------------  --------------------  --------------------  ------------  
corpus-hotpotqa-E5-small-0.fvec         ivf   5000000             23018                110338                     0             5

index_name                       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall     visited  filter_selectivity  filter_cached  oversampling_factor
-------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  ----------  ------------------  -------------  -------------------  
corpus-hotpotqa-E5-small-0.fvec         ivf                1.000         1.39              0.00           0.00  720.72    0.70   101969.61                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf                5.000         6.24              0.00           0.00  160.26    0.80   501993.56                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               10.000        12.00              0.00           0.00   83.32    0.82  1001983.24                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               30.000        35.17              0.00           0.00   28.43    0.84  3001917.90                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               50.000        57.58              0.00           0.00   17.37    0.85  5001867.57                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf               70.000        80.82              0.00           0.00   12.37    0.85  7001670.77                1.00           true                 3.00
corpus-hotpotqa-E5-small-0.fvec         ivf              100.000       118.54              0.00           0.00    8.44    0.85  9999825.00                1.00           true                 3.00

@tteofili tteofili marked this pull request as ready for review December 23, 2025 09:47
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 23, 2025
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Dec 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @tteofili, I've created a changelog YAML for you.

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been tracking; this lgtm

@tteofili tteofili merged commit b514470 into elastic:main Jan 8, 2026
35 checks passed
jimczi pushed a commit to jimczi/elasticsearch that referenced this pull request Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants