Skip to content

AcceptDocs#cost() fully consumes iterator, invalidating any possible perf gain #15561

@benwtrent

Description

@benwtrent

Description

I have been digging into the AcceptDocs API and I noticed the following from the java docs:

  /**
   * Return an approximation of the number of accepted documents. This is typically useful to decide
   * whether to consume these accept docs using random access ({@link #bits()}) or sequential access
   * ({@link #iterator()}).
   *
   * <p><b>NOTE</b>: This must not be called after {@link #iterator()}.
   *
   * @return approximate cost
   */
  public abstract int cost() throws IOException;

However the implementation for the most common non-cached iterator:

    public int cost() throws IOException {
      createBitSetAcceptDocsIfNecessary();
      return acceptBitSet.cardinality();
    }

Actually fully consumes the iterator and just calls cardinality (nothing approximate at all...).

Why are we doing that? Why aren't we relying on DocIdSetIterator#cost or at least acceptBitSet.cardinality?

It seems to me the main idea behind AcceptDocs is the ability to bypass realizing the bitset and to just iterate as normal when the filter is very restrictive...

//cc @shubhamvishu

Version and environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions