Move some search related utilities to search_tester to make them usable#831
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
| ]) | ||
|
|
||
| search_count = len(results) | ||
| logger.info(f"Search through mongos returned {search_count} documents") |
There was a problem hiding this comment.
[Re: lines +572 to +576]
I remember @lsierant talked about comparing count of all the documents, but if all the documentes are not yet indexed we might accidentally make this test flakey. That's the reason, I was thinkng if it's ok t keep this as it is.
The other option is to try to figure out how many documents are indexed in and keep trying until all the documents are indexed.
See this comment inline on Graphite.
There was a problem hiding this comment.
yes, getting come count queries first before sharding is the way to go, let's be deterministic with our tests
There was a problem hiding this comment.
also after sharding we could execute some commands to check if the collection was sharded, the balancer is running etc. we could wait before.
Let's also check how the search indexes behave when search index is created on unsharded and after we shard that collection.
There was a problem hiding this comment.
there are method to get sharding stats to check the chunk distribution, let's ensure the chunks are distributed on more than one shard before querying data
|
|
||
| def enable_sharding(self, database_name: str): | ||
| try: | ||
| self.client.admin.command("enableSharding", database_name) |
There was a problem hiding this comment.
let's use shardAndDistributeCollection- it's a much more performant sharding method available from 8.0. https://www.mongodb.com/docs/manual/reference/method/sh.shardAndDistributeCollection/
enableSharding is an old way of doing sharding which can be slow as the balancer kicks in later. shardAndDistribute does balancing in one go IIUC
There was a problem hiding this comment.
I think shardAndDistributeCollection will replace both enableSharding+shardCollection
1. Add a test to make sure that after sharding, sum of documents in all shards is equal to the total document count in collection 2. Add utility to run search queries
662e5c0 to
3e1a4f9
Compare
a7e683f to
e76af75
Compare
6548799
into
search/sharded-cluster

Summary
Proof of Work
Checklist
skip-changeloglabel if not needed