Skip to content

Move some search related utilities to search_tester to make them usable#831

Merged
viveksinghggits merged 3 commits intosearch/sharded-clusterfrom
search/sharded-cluster-test-utils
Feb 27, 2026
Merged

Move some search related utilities to search_tester to make them usable#831
viveksinghggits merged 3 commits intosearch/sharded-clusterfrom
search/sharded-cluster-test-utils

Conversation

@viveksinghggits
Copy link
Collaborator

Summary

Proof of Work

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

Copy link
Collaborator Author

viveksinghggits commented Feb 25, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

])

search_count = len(results)
logger.info(f"Search through mongos returned {search_count} documents")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Re: lines +572 to +576]

I remember @lsierant talked about comparing count of all the documents, but if all the documentes are not yet indexed we might accidentally make this test flakey. That's the reason, I was thinkng if it's ok t keep this as it is.

The other option is to try to figure out how many documents are indexed in and keep trying until all the documents are indexed.

See this comment inline on Graphite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, getting come count queries first before sharding is the way to go, let's be deterministic with our tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also after sharding we could execute some commands to check if the collection was sharded, the balancer is running etc. we could wait before.
Let's also check how the search indexes behave when search index is created on unsharded and after we shard that collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are method to get sharding stats to check the chunk distribution, let's ensure the chunks are distributed on more than one shard before querying data


def enable_sharding(self, database_name: str):
try:
self.client.admin.command("enableSharding", database_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use shardAndDistributeCollection- it's a much more performant sharding method available from 8.0. https://www.mongodb.com/docs/manual/reference/method/sh.shardAndDistributeCollection/

enableSharding is an old way of doing sharding which can be slow as the balancer kicks in later. shardAndDistribute does balancing in one go IIUC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think shardAndDistributeCollection will replace both enableSharding+shardCollection

1. Add a test to make sure that after sharding, sum of documents in all shards is equal to the total document count in collection
2. Add utility to run search queries
@viveksinghggits viveksinghggits force-pushed the search/sharded-cluster-test-utils branch from 662e5c0 to 3e1a4f9 Compare February 26, 2026 23:51
@viveksinghggits viveksinghggits marked this pull request as ready for review February 27, 2026 12:01
@viveksinghggits viveksinghggits requested a review from a team as a code owner February 27, 2026 12:01
@viveksinghggits viveksinghggits requested review from lucian-tosa and mircea-cosbuc and removed request for a team February 27, 2026 12:01
@viveksinghggits viveksinghggits merged commit 6548799 into search/sharded-cluster Feb 27, 2026
25 of 30 checks passed
@viveksinghggits viveksinghggits deleted the search/sharded-cluster-test-utils branch February 27, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants