Skip to content

feat: update content libraries API to use events from openedx-core [FC-0117]#38437

Open
bradenmacdonald wants to merge 17 commits intoopenedx:masterfrom
open-craft:braden/events-in-core
Open

feat: update content libraries API to use events from openedx-core [FC-0117]#38437
bradenmacdonald wants to merge 17 commits intoopenedx:masterfrom
open-craft:braden/events-in-core

Conversation

@bradenmacdonald
Copy link
Copy Markdown
Contributor

Description

With openedx/openedx-core#543, openedx-core now emits events when changes happen within a Learning Package.

This PR updates the content libraries code and search code accordingly. The main benefit is that the search index now stays up to date regardless of which APIs are used. We don't need to "wrap" some low-level APIs in high-level APIs just to add events.

Note: The "Library Collections" code was already working fine because it used Django signals to watch for changes to the Collection-PublishableEntity many-to-many relationship, but it shouldn't have been so aware of the internals of openedx_content.

Supporting information

See openedx/openedx-core#462

Testing instructions

Coming soon

Deadline

Verawood

Other information

Depends on openedx/openedx-core#543 .

I wrote most of the code but used Claude Code for small bits and pieces.

@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Apr 23, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in Contributions Apr 23, 2026
Comment on lines -454 to +488
{ # Not 100% sure we want this, but a PUBLISHED event is emitted for container 2
# because one of its children's published versions has changed, so whether or
# not it contains unpublished changes may have changed and the search index
# may need to be updated. It is not actually published though.
# TODO: should this be a CONTAINER_CHILD_PUBLISHED event?
# No PUBLISHED event is emitted for container 2, because it doesn't have a published version yet.
# Publishing 'html_block' would have potentially affected it if container 2's published version had a
# reference to 'html_block', but it doesn't yet until we publish it.
)

# note that container 2 is still unpublished
c2_after = self._get_container(container2["id"])
assert c2_after["has_unpublished_changes"]

# publish container2 now:
self._publish_container(container2["id"])
self.expect_new_events(
{ # An event for container 1 being published:
"signal": LIBRARY_CONTAINER_PUBLISHED,
"library_container": LibraryContainerData(
container_key=LibraryContainerLocator.from_string(container2["id"]),
),
},
{ # An event for the html block in container 2 only:
"signal": LIBRARY_BLOCK_PUBLISHED,
"library_block": LibraryBlockData(
self.lib1_key, LibraryUsageLocatorV2.from_string(html_block2["id"]),
),
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little hard to tell from the diff here (because of how it's split up), but before this PR, a spurious PUBLISHED event was emitted for container 2 before it was ever published at all. I think the new behavior is much more correct, because it's built on Learning Core's new publish log side effects. I have explained why in the test case and added additional tests to ensure side effects are still resulting in PUBLISHED events when they should be. (Once we actually published container 2)

Comment on lines -546 to +654
{
"signal": CONTENT_OBJECT_ASSOCIATIONS_CHANGED,
"content_object": ContentObjectChangedData(
object_id=str(container_key),
changes=["collections", "tags"],
),
},
# We used to emit CONTENT_OBJECT_ASSOCIATIONS_CHANGED here for the restored container, specifically noting
# that changes=["collections", "tags"], because deleted things may have collections+tags that are once
# again relevant when it is restored. However, the CREATED event should be sufficient for notifying of that.
# (Or should we emit CREATED+UPDATED to be extra sure?)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this, as it's a change - no longer emitting CONTENT_OBJECT_ASSOCIATIONS_CHANGED in the case of restoring a deleted object.

TODO: test publishing a thing with collections and tags, delete it, then "revert all changes" in the library UI and make sure it re-appears with collections and tags intact. I haven't tested this yet.

Comment on lines +825 to +841
# openedx_content also lists ancestor containers of the affected units as changed.
# We don't strictly need this at the moment, at least as far as keeping our search index updated.
{
"signal": LIBRARY_CONTAINER_UPDATED,
"library_container": LibraryContainerData(container_key=self.subsection1.container_key),
},
{
"signal": LIBRARY_CONTAINER_UPDATED,
"library_container": LibraryContainerData(container_key=self.subsection2.container_key),
},
{
"signal": LIBRARY_CONTAINER_UPDATED,
"library_container": LibraryContainerData(container_key=self.section1.container_key),
},
{
"signal": LIBRARY_CONTAINER_UPDATED,
"library_container": LibraryContainerData(container_key=self.section2.container_key),
Copy link
Copy Markdown
Contributor Author

@bradenmacdonald bradenmacdonald Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last change: we now emit events for ancestors of parent containers of modified entities, which we weren't doing before (before it was only one level - parent containers but not their ancestors in turn). I don't think we have a use case for this, but I am not sure if I could or should filter them out somehow, as the publish log treats direct ancestors (which we definitely care about and need events for) and their ancestors in turn exactly the same.

To avoid performance issues, in such cases where more than one ancestor is included in the event stream, the event for the directly modified entity is emitted synchronously but the indirect container events are emitted asynchronously. This seems to work well in the UI, making it update correctly/immediately when e.g. renaming something, but should still preserve performance even if you rename a component used in thousands of different containers.

@bradenmacdonald bradenmacdonald added the FC Relates to an Axim Funded Contribution project label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Still need to look through test changes.)

At a high level, I do have a bit of a concern that having some things be sync and some async at a granular level (depending on how many things there are) is going to lead to inconsistencies and bugs. I think it's a reasonable tradeoff at the moment--just something we should keep an eye on.

Comment on lines +49 to +52
# Which entities were _directly_ changed here?
direct_changes = [asdict(change) for change in change_log.changes if change.new_version != change.old_version]
# And which entities were indirectly affected (e.g. parent containers)?
indirect_changes = [asdict(change) for change in change_log.changes if change.new_version == change.old_version]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Comment] This reminds me that we should probably put a couple of helper methods in DraftChangeLog and DraftChangeLogRecord for this sort of thing, so we can keep the terminology consistent over time. Made a ticket for that: openedx/openedx-core#560

update_async(change_list=indirect_changes) # update the many other affects entities async.
else:
# More than one entity was changed at once. Handle asynchronously:
update_async(change_list=[*direct_changes, *indirect_changes])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [optional]: I think it'd be a little easier to follow with early returns so there's less nesting, but this is totally readable as-is.

Comment on lines +84 to +88
⏳ This event is emitted synchronously and this handler is called
synchronously. If a lot of entities were published, we need to dispatch
an asynchronous handler to deal with them to avoid slowdowns. If only one
entity was published, we want to deal with that synchronously so that we
can show the user correct data when the current requests completes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify to note that it's async for any number > 1 ("a lot" might be misleading).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated to say "multiple" instead of "a lot of". d3e7fcd

Comment on lines +95 to +96
if len(change_log.changes) == 1:
fn = tasks.send_events_after_publish
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes side-effects, right? If so, I think it should be mentioned prominently in the docstring, since that's going to be unintuitive for a lot of folks who are going to think "publish the Component" would only result in one entry.

Comment on lines +129 to +130
# .. event_implemented_name: LIBRARY_COLLECTION_CREATED
# .. event_type: org.openedx.content_authoring.content_library.collection.created.v1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these just go where the signal is defined, not where it's sent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized later in the review that this was like this in the code that you refactored, but I still think it's wrong and should be removed in all the places other than where the signal is first defined.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, great! I didn't want them there anyways; I was just copying the existing pattern without understanding it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee Actually, it seems the event annotations are quite deliberate - see #36473 and it is mentioned in these docs:

In-line code annotations are also used when integrating the event into the service.

It's not super clear to me why this is the case but I think it's related to what the doc says at the end: "ensures that [the event] is used correctly across services" ?

Maybe @mariajgrimaldi or @BryanttV can clarify how these are used?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, you're totally right. That's... really weird to me. But okay, thank you.

Comment thread openedx/core/djangoapps/content_libraries/tasks.py
new_child_ids: Iterable[PublishableEntity.ID]
# If the title has changed, we notify ALL children that their parent container(s) have changed, e.g. to update the
# list of "units this component is used in", "sections this subsection is used in", etc. in the search index
title_changed: bool = bool(old_version and new_version) and (old_version.title != new_version.title)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain in a comment why this does not include events where the old_version was None.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain in a comment why this does not include events where the old_version was None.

It didn't because the else: path happened to work just as well in that particular case, and I could make the code slightly more compact by not having to deal with getting the title old_version.title when old_version was none...

But that made me realize this was more convoluted than it needs to be, so I refactored this logic entirely to be much simpler and easier to follow. f3961bf

# list of "units this component is used in", "sections this subsection is used in", etc. in the search index
title_changed: bool = bool(old_version and new_version) and (old_version.title != new_version.title)
if title_changed:
# TODO: there is no "get entity list for container version" API in openedx_content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also get effectively the same thing via dependencies (new_version.dependencies.all())

Comment on lines +226 to +228
# Different container versions but same list of child entities. For now we don't need to do anything, but in the
# future if we have some other kind of per-container settings relevant to child entities we might need to handle
# this the same way as title_changed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about the case where a container changes at the same time as component content within it? I'm not clear on how the meilisearch registers "return this Unit because the text that I'm typing is in a Component that this Unit has".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about the case where a container changes at the same time as component content within it?

No, I don't see that causing any problems.

In any case, I removed this code as checking for old_entity_list_id == new_entity_list_id is a pretty rare optimization, and you can also have a situation where the entity lists have different IDs but the same children, so it's simpler just to compare the children anyways.

I'm not clear on how the meilisearch registers "return this Unit because the text that I'm typing is in a Component that this Unit has".

It doesn't really, it would just match the Component itself, and then we display that component's parent units in the UI if the user is interested in seeing its context.

Comment on lines +292 to +295
if hasattr(entity, "component"):
opaque_key = api.library_component_usage_key(library_key, entity.component)
elif hasattr(entity, "container"):
opaque_key = api.library_container_locator(library_key, entity.container)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This really happens often enough where it seems like it should be a helper fn somewhere.

Copy link
Copy Markdown
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor additional nit request.

# Unlike revert_changes below, we do not have to re-index collections,
# because publishing changes does not affect the component counts, and
# collections themselves don't have draft/published/unpublished status.
content_api.publish_all_drafts(learning_package.id, published_by=user_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please add a comment here indicating that we do expect a bunch of events to be emitted by publishing, since it might otherwise not be obvious to folks just how much stuff is happening here.

Comment on lines +129 to +130
# .. event_implemented_name: LIBRARY_COLLECTION_CREATED
# .. event_type: org.openedx.content_authoring.content_library.collection.created.v1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, you're totally right. That's... really weird to me. But okay, thank you.

@bradenmacdonald
Copy link
Copy Markdown
Contributor Author

At a high level, I do have a bit of a concern that having some things be sync and some async at a granular level (depending on how many things there are) is going to lead to inconsistencies and bugs. I think it's a reasonable tradeoff at the moment--just something we should keep an eye on.

Yeah, I would prefer a more consistent approach too. But it comes from our direct experience with the libraries work... making everything async makes updating the UI after any change pretty awkward, and making everything sync is way too slow in many cases like renaming something that is used in many different places. So even though it's more complex, this sort of compromise seems to work best for now.

In test_home.py:258, setUp calls OrganizationFactory(). That factory uses a factory_boy Sequence for short_name

The sequence counter is process-global and monotonically increasing — it's never reset between tests. So:

Run this test alone → org short_name is name0 → v2 key is lib:name0:test-key.
Run it after N other tests that built Organizations → nameN → lib:nameN:test-key.
The expected-response dicts at test_home.py:332 and test_home.py:367 hardcode 'lib:name0:test-key', which is why it only passes in isolation or if it happens to run before other Organization-using tests.
Comment on lines +1218 to +1220
# First, remove all children from the subsection:
with self.captureOnCommitCallbacks(execute=False): # suppress events
library_api.update_container_children(self.subsection.container_key, [], None)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before: this test was changing a subsection to have the exact same unit child it already had, and that was emitting an event and updating the search index, because library_api.update_container_children was just hard-coded to send out LIBRARY_CONTAINER_UPDATED and CONTENT_OBJECT_ASSOCIATION_CHANGED events every time.

Now: our event logic is "smarter" and only sends out events if the container's children actually changed. So to keep the test working, first I have to clear the container's children.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

3 participants