Skip to content

feat: Improves meilisearch configuration step#38384

Open
farhaanbukhsh wants to merge 2 commits intoopenedx:masterfrom
open-craft:farhaan/improve-meilisearch-configuration
Open

feat: Improves meilisearch configuration step#38384
farhaanbukhsh wants to merge 2 commits intoopenedx:masterfrom
open-craft:farhaan/improve-meilisearch-configuration

Conversation

@farhaanbukhsh
Copy link
Copy Markdown
Member

@farhaanbukhsh farhaanbukhsh commented Apr 20, 2026

Description

The changes here are to add a Drift Calculator for the Meilisearch index to help us configure Meiliseach at the fresh installation or upgrade. This mechanism triggers on each run of migrate so wether it is a new installation or upgrade. This makes sure that we calculate and gauge the status of Meiliseach studio index and have a plan to mitigate it.

Useful information to include:

We try to caculate and see here how far the changes have gone from codebase and try to bring it back to codebase. Specially with change in PK there is not much we can do we have to drop the index and create and configure a new one.

We are making use to make that happen when migrate runs. Hence, whenever ./manage.py cms migrate runs this command follows it. The diff is calculate so only when an action is needed it will be taken into effect.

Supporting information

  1. Change to this branch in edx-platform
  2. Stop all the containers tutor dev stop
  3. Build the openedx image tutor images build openedx-dev
  4. tutor dev start -d
  5. tutor dev status | rg meilisearch --> This helps us find out if meilisearch is running.
  6. Drop into the shell tutor dev exec -it cms -- /bin/bash
  7. Run ./manage.py cms migrate
  8. There should be a line api.py:580 - Index is populated and correctly configured. No action needed. in the log

Lets do a bit more tests

  1. Now from bash shell open the django shell ./manage.py cms shell
  from openedx.core.djangoapps.content.search.api import (                                                                                                                 
       _get_meilisearch_client,                                                                                                                                             
       _wait_for_meili_task,                                                                                                                                                
       STUDIO_INDEX_NAME,                                                                                                                                                   
   )                                                                                                                                                                        
                                                                                                                                                                            
   client = _get_meilisearch_client()                                                                                                                                       
   index = client.get_index(STUDIO_INDEX_NAME)                                                                                                                              
                                                                                                                                                                            
   # Break a setting to simulate drift                                                                                                                                      
   _wait_for_meili_task(index.update_sortable_attributes(["display_name"]))                                                                                                 
   print("Introduced drift: removed some sortable attributes")
  1. This introduces an anomaly and we should see if the code fixes it.
  2. Drop out of the Django shell after running the above code and run ./manage.py cms migrate again, you will the script is fixing the changes.
  3. I used the below script to check the status of the index while developing
   from openedx.core.djangoapps.content.search.api import (                                                                                                                 
       _get_meilisearch_client,                                                                                                                                             
       _detect_index_drift,                                                                                                                                                 
       STUDIO_INDEX_NAME,                                                                                                                                                   
   )                                                                                                                                                                        
                                                                                                                                                                            
   client = _get_meilisearch_client()                                                                                                                                       
   drift = _detect_index_drift(STUDIO_INDEX_NAME)                                                                                                                           
                                                                                                                                                                            
   print(f"Index: {STUDIO_INDEX_NAME}")                                                                                                                                     
   print(f"  exists:                      {drift.exists}")                                                                                                                  
   print(f"  is_empty:                    {drift.is_empty}")                                                                                                                
   print(f"  primary_key_correct:         {drift.primary_key_correct}")                                                                                                     
   print(f"  distinct_attribute_match:    {drift.distinct_attribute_match}")                                                                                                
   print(f"  filterable_attributes_match: {drift.filterable_attributes_match}")                                                                                             
   print(f"  searchable_attributes_match: {drift.searchable_attributes_match}")                                                                                             
   print(f"  sortable_attributes_match:   {drift.sortable_attributes_match}")                                                                                               
   print(f"  ranking_rules_match:         {drift.ranking_rules_match}")                                                                                                     
   print(f"  ---")                                                                                                                                                          
   print(f"  is_settings_drifted:         {drift.is_settings_drifted}")                                                                                                                                                                                                       

Phase II Testing

  1. We need to create data gaps in the current meilisearch index to see if reindexing fills it up
  2. Lets delete one or more document from Meilisearch studio index
  3. Drop in the Djanog shell ./manage cms shell
  4. Run the code below, which removes one document; you can remove more if you want
from openedx.core.djangoapps.content.search import api
from openedx.core.djangoapps.content.search.documents import Fields, meili_id_from_opaque_key

client = api._get_meilisearch_client()
index = client.get_index(api.STUDIO_INDEX_NAME)
results = index.search("", {"limit": 5})
hit = results["hits"][0]

doc_id = hit["usage_key"]

print("Before:", index.get_stats().number_of_documents)
api.delete_index_doc(doc_id)
print("After:", index.get_stats().number_of_documents)
  1. Note the number of docs before and after the deletion
  2. Drop out of the Django shell and run the reindexing command: ./manage.py cms reindex_studio
  3. You will see this scheduling the celery tasks now we need to check the stats and IncrementalCompleteIndex
from openedx.core.djangoapps.content.search import api
from openedx.core.djangoapps.content.search.documents import Fields
from openedx.core.djangoapps.content.search.models import IncrementalIndexCompleted

client = api._get_meilisearch_client()
index = client.get_index(api.STUDIO_INDEX_NAME)
print("Restored docs:", index.get_stats().number_of_documents)
print("Incremental Index Count:", IncrementalIndexCompleted.objects.all().count())
  1. This should be restored to the previous number, and the incremental index should be 0

Deadline

ASAP

Other information

Related Tutor PR: overhangio/tutor#1374
Private Ref: BB-10767

@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Apr 20, 2026
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @farhaanbukhsh!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great!

Comment thread openedx/core/djangoapps/content/search/management/commands/reindex_studio.py Outdated
Comment thread openedx/core/djangoapps/content/search/management/commands/reindex_studio.py Outdated
Comment thread openedx/core/djangoapps/content/search/management/commands/reindex_studio.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py
Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Comment thread openedx/core/djangoapps/content/search/api.py
…up with a migration plan

and configuration plan depending on the state. This introduces a mechanism it or a drift engine which drill down the Meiliseach configuration
and figures out what has changed:

- settings
- primary key

depending on the change we follow a strategy wether to migrate the data or recreate the index

Signed-off-by: Farhaan Bukhsh <farhaan@opencraft.com>
…index.

Signed-off-by: Farhaan Bukhsh <farhaan@opencraft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). open-source-contribution PR author is not from Axim or 2U

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

4 participants