Open edX Core - Arch Sync Notes
Team: @Kyle McCormick @Dave Ormsbee (Axim) @Braden MacDonald
Epic: https://github.com/openedx/openedx-learning/issues/353
TOC:
- 1 Jun 9, 2026
- 2 May 5, 2026
- 3 Apr 28, 2026
- 4 Apr 21, 2026
- 5 Apr 14, 2026
- 6 Apr 7, 2026
- 7 Mar 31, 2026
- 8 Mar 24, 2026
- 9 Mar 17, 2026
- 10 Mar 11, 2026
- 11 Mar 6, 2026
- 12 Feb 10, 2026
- 13 Jan 20, 2026
- 14 Jan 13, 2026
- 15 Jan 6, 2026
- 16 Dec 16, 2025
- 17 Nov 25, 2025
- 18 Oct 21, 2025
- 19 Oct 7, 2025
- 20 Aug 19, 2025
- 21 Jul 29, 2025
- 22 Jul 22, 2025
- 23 Jun 17, 2025
- 24 Jun 10, 2025
- 24.1.1 Outline (45 min talk)
- 24.2 Outline Strawman:
- 25 Jun 3, 2025
- 25.1 May 15, 2025
- 26 2025-05-15
- 27 2025-05-06
- 28 2025-04-29
- 29 2025-04-02
- 30 2025-03-05
- 31 2025-02-05
- 32 2025-01-09
- 33 2024-12-18
- 34 Old Notes
- 34.1 Talk Proposal
- 34.1.1 Title
- 34.1.2 Description (<500 words)
- 34.1.3 Type
- 34.1.4 Target Audience
- 34.1.5 Proposal
- 34.1.6 Rough Talk Outline
- 34.1.7 Additional Notes
- 34.2 2024-11-20
- 34.3 2024-11-13
- 34.1 Talk Proposal
Jun 9, 2026
[Kyle] Verawood status
[dave] braden got a bunch of stuff done in terms of backported fixes
https://github.com/openedx/openedx-platform/pull/38695
should help the import speed issue
[dave] Would appreciate this docs PR getting early review so I don’t have to rebase: https://github.com/openedx/openedx-core/pull/583
Recent report https://discuss.openedx.org/t/out-of-sync-library-components-counted-after-unit-deletion-from-course/19087
Known bugs:https://github.com/openedx/frontend-app-authoring/issues/3045
Are we going to target all of these for verawood.1?
^ No, but they are priority ordered, and the first few are labelled with
release blockerRelease is still scheduled for Jun 23, 2026
Kyle will run these by Jenna
[Dave]: I have some longer term topics that are post-Verawood concerns (and can be postponed to address any Verawood ones):
Do we want to investigate django-polymorphic for some of our data models?
Worth looking into. Small, well-maintained.
[kyle] 5 minute skim – looks solidly maintained and serves a good use case. no objections
When would be a good time to cut over to using pyproject.toml?
How do folks feel about django-ninja as the basis of an openedx_content REST API?
Bring it up at standup, get feanil’s opinion
Pilot it with one REST API first, then evaluate
Possibility: Pathways REST API
But we need to get authz into core.
Now that there’s a separate repo for authz, should we be bringing more of libraries into openedx-core in Willow?
May 5, 2026
Next steps between now and verawood.1
[Dave] Course import into a library is painfully slow. The current pattern is to add and publish components one by one, meaning that even if events are processed on a PublishLog basis, we’re doing hundreds to thousands of them per import. Can we draft all the changes first and then publish all at once?
[Braden] PR: https://github.com/openedx/openedx-platform/pull/38508 Does that speed it up?
[Dave] It speeds up the actual database part of it, but it’s the meilisearch re-indexing that’s glacial. It looks like each block gets its own task. I guess ideally, we’d want to fire off the search updates in bulk based on the DraftChangeLog/PublishLog events.
[Braden] Thought that the modulestore side was slower than the meili side
[Dave] Not sure
[Dave] Also, this reminds me that we should have a
.draftsproperty onDraftChangeLogso we can do something likepublish_from_drafts(lp.id, change_log.drafts). “Publish the DraftChangeLog I just made” has to be a really common use case. Maybe its own function entirely.[Dave] This is the top priority bugfix
[Braden] Will take a look as priority
[Braden] https://github.com/openedx/openedx-platform/issues/38507 - fix for Verawood?
We should do a targeted fix for this
[Kyle] PRs that didn’t make it:
https://github.com/openedx/openedx-core/pull/580 (“breaking change” but honestly it’s a bugfix)
Try to fix this after modulestore_migrator is resolved
Backport it, time permitting
master PR is all ready to go
Braden will do this one
https://github.com/openedx/openedx-core/pull/573 (breaking change, although it guards against several buggy ways to call the API)
Not this one - punt to Willow
Good for DevX and for avoiding buggy usage of the API, but too breaking to backport right now
https://github.com/openedx/openedx-core/pull/564 (non-breaking change, just nice to have)
Not this one - punt to Willow
Very nice both for DevX and for performance but not critical
Options for each one:
Backport into 1.x branch and install that branch into verawood
Put it aside for now, come back to it after the conference for Willow
[Dave] Generally would should be taking a look at performance
What’s the release date?
Jun 23, 2026
Peformance and docs stuff will mostly land post-conference
[Dave] Do we want to cut a 1.x branch? I’m concerned about the unintended breakage that can ripple out to platform, particularly if we do anything that might require data migration. Even going from 0.47 to 0.48 to 1.0 caused unexpected problems.
[Kyle] I think this is a great idea
[Kyle] Created
verawood-backports, to be versioned 1.0.2, 1.0.3, etc.mainwill start with1.1.0/2.0.0and continue from there
[Dave] Can we use some of Braden’s remaining hours to explore Courses-on-Core work?
[Kyle] Yes 100%
Braden will use remaining time on this
Kyle/Dave will talk about resourcing for Courses-In-Core
[Dave] I’ve been doing some docs work with Claude in the background
So nice
rst formatting in docstrings
Currently in draft, big
Docstrings need updates - will do separate
Will do a similar one on the API side later
timeline
Kyle mostly focused on conference for next two weeks
Braden away during conference and until June 9th ish
Kyle away after conference until June 9th ish, then will be focused on docs and performance
Apr 28, 2026
Release week, part 2
Verawood Need to haves
https://github.com/openedx/openedx-platform/pull/38437
Merged
https://github.com/openedx/openedx-core/pull/559
[Kyle] I think everything can be marked stable
Verawood Nice to haves
https://github.com/openedx/openedx-core/pull/566
[Kyle] Thoughts on approach? Rush this in, or too ambitious?
Do it
Example of missing user ID in content libraries even though the API accepts it: https://github.com/openedx/openedx-platform/blob/e634f00be0aaf55965b0b412a76e4b9a5c342f96/openedx/core/djangoapps/content_libraries/api/containers.py#L151-L172
Weird case with collections - adding takes user ID but removing doesn’t: https://github.com/openedx/openedx-platform/blob/e634f00be0aaf55965b0b412a76e4b9a5c342f96/openedx/core/djangoapps/content_libraries/api/collections.py#L162-L174
https://github.com/openedx/openedx-core/pull/564
[Kyle] I think this will be easy to land. But, arguably lower priority because it’s non-breaking.
Punting for now
https://github.com/openedx/openedx-core/issues/463
[Braden] Will aim to have PR today. Should be low risk, no API changes.
https://github.com/openedx/openedx-core/pull/565
Braden or Kyle will review when ready
Next few weeks:
Example plugin
Docs
Timestamps and author
Proposal
publish_all_drafts cannot happen in context manager (Braden)
Non-versionsed APIs (create_pub_ent, etc.) can be called outside context manager, and then they require
changed_at=andchanged_by=with draft_changes_for(learning_package_id=blah, changed_by=blah, changed_at=blah|None)Non-versionsed APIs (create_pub_ent, etc.) are fine to call in this context manager, and then they don’t allow
changed_at=andchanged_by=May loosen in the future if we want to do restore-with-attribution or something like that
Versioned APIs (create_pub_ent_ver, etc.) must be called in the context manager
They don’t accept
changed_at=andchanged_by=, they just use it from the context
Apr 21, 2026
Release week
@Kyle McCormick List of PRs to land before the release
Need-to-have (blocks release)
Nice-to-have (doesn’t block)
https://github.com/openedx/openedx-core/issues/463
Braden/Kyle?
https://github.com/openedx/openedx-core/pull/501 - Jenna flagged the history log stuff as nice-to-have but not release blocking.
Dave
Include type prefix in container entity_refs
Dave: New containers get new refs, but don’t migrate the old data
We can do this, and then verify that backups from Verawood will still restore into Ulmo
Kyle?
Punt
Make (container_type,container_code) unique in LP rather than just (container_code,)
Can defer indefinitely
Want to talk about https://github.com/openedx/openedx-core/pull/554
Learning package delete event?
there’s a library delete event, but you can delete a library w/o deleting the learning package
restore package makes a bunch of learning package which are “orphaned” until hooked up to a library. don’t think we ever wrote code to delete the ones that don’t get promoted into being libraries
we definitely want to be able to delete learning packages
yes to LEARNING_PACKAGE_DELETED event
does that ^ trigger COLLECTION_DELETED and ENTITY_DELETED ? ENTITY_REMOVED_FROM_COLLECTION ?
Dave: we should probably just have LEARNING_PACKAGE_DELETED, not triggering the others
Braden: we’d need to figure out which entities and collections to delete from meili based on LEARNING_PACKAGE_DELETED
Kyle: does COLLECTION_DELETED trigger ENTITY_REMOVED_FROM_COLLECTION ?
Braden: COLLECTION_UPDATED handles all of these
should be fine to just raise LEARNING_PACKAGE_DELETED , and not individual ENTITY_DELETED events, since it’s not undoable, and handling it just means wiping out all associated data
Braden will need to figure out relationship between LEARNING_PACKAGE_DELETED and CONTENT_LIBRARY_DELETED
https://github.com/openedx/openedx-core/pull/554
This PR is just restore, not backup
Does not break backup_restore python API used by platform
New set of tests
Currently
extract from archive ← schema exists.
validation ← just using ootb pydantic validation. error reporting not working.
load into learning packages ←
Reviewable EOD tomorrow?
Current philosophy is “don’t be permissive”
but it’s more permissive in that you can be missing data/fields, or you can add extra fields
it is not permissive in that if you have a broken entity import, it blows up the whole import
i.e. no partially successful import
would be nice to have a list of what went wrong
regardless, error messages will be linked back to source toml file
Is this a public API?
no
Only difference is that if Willow+ archives add fields, this would be more tolerant
does it allow multiple entities in one toml file?
no
would be reluctant to without declaring a v2
Apr 14, 2026
~1.5 weeks out from Verawood cutoff
@Dave Ormsbee (Axim) Proposed high level structure and abstractions for validation/restore flow:
(Zip Archive | Local Path | GitHub | etc.) → Storage Filesystem (fsspec)
Extract TOML from fs and compile into
UnvalidatedCompletePackageInputRationale: Allow a way to have input formats that are better tailored to specific use cases, e.g. editing entire sections in the same file.
Contents
Unvalidated Learning Package dict (simple Python types)
Includes all TOML content (entities, versions, containers, collections), but does not include media files like
block.xmlor static assets.This is meant to be a minimal transformation of the input TOML.
Structured to try to prevent certain types of logical errors, e.g. refs are used as keys in a dict, so there’s no way to represent duplicate definition of entities.
Errors
Mostly consistency errors, e.g. redefining the same entity multiple times.
Not the actual JSON Schema validation, but just errors when converting the source TOML into the compiled dict.
Resources (where to find media data for later)
UnvalidatedCompletePackageInput→CompletePackageInput+ errorsThis is done using Pydantic models.
Input and output models will be kept separate (output models are much stricter about requiring certain fields).
JSON Schema is generated from the input model.
Two levels of errors:
Ones that JSON Schema can handle, e.g. missing fields, regex not matching, wrong types, etc.
Deeper ones that JSON Schema can’t deal with, like referential integrity (pointers to versions that don’t exist, containers referencing children that don’t exist, etc.)
Missing resources
Strict mode?
CompletePackageInput→LearningPackageJesper: Having documentation of the end-to-end restore pipeline will be good
@Jesper Hodge Did I get that right ^ ?
Braden: We have many things: tar.gzs, xml, zips, json, we want to have a sqlite format…
Data researchers would probably want to have an ability to import openedx archives with a specific library - do we want this as a separate opendx_data library ? Probably not necessary…
Kyle: could this be modified to be update instead of create?
Dave: trickiest thing is version numbering
imagine that the archive specified v2 of a compnent, but you also have a v2
Jesper: when importing a version, it should always become the highest (newest) version
Kyle: concerned about the idea of having separate formats for full restore vs. partial import
Braden: stagedcontent
should this new format be used to represent stagedcontent
would be great if we could represent everything in a library as a file, just like we can with OLX today, enabling things like copy-paste and drag-and-drop UIs
Apr 7, 2026
~2.5 weeks out from Verawood cutoff
@Kyle McCormick Whiteboarding - “north star” architecture https://excalidraw.com/#room=9e33ec3e3ebf9175de2b,nAqtkyhx59SUYlEl9a-XiQ
@Kyle McCormick @Braden MacDonald - Confirm we want
LEARNING_PACKAGES_*events (https://github.com/openedx/openedx-core/issues/462#issuecomment-4193595258 )@Kyle McCormick @Dave Ormsbee (Axim) Met with MITx physics course author teams, whose use OLX heavily.
They have several repos, each holds 1 or more related courses
possible argument for learning packages holding multiple courses / course runs
Automated workflow: merge to master triggers the XML to import into their staging env
Last-minute tweaks may be made in studio, but are wiped out upon next course update from git. Ad hoc process for remembering to make fixes back in XML.
Each section is an XML file, holds structure down to unit or component level
toml does not support multiple levels in one file. seems like it’d be easy to support that, though?
Many units are authored in .tex files and converted in unit XML via latex2edx
does this imply that units should be able to hold assets?
Decision: For Verawood, just use pydantic to validate and document the current format. Worry about pluggability later, as we’re considering not even sticking with TOML (sqllite? more OLX?) long term.
@Braden MacDonald opened a PR to make type annotations for primary keys.
Mar 31, 2026
~3.5 weeks out from Verawood cutoff
@Dave Ormsbee (Axim) : Does enrollment get its own top level app in openedx-core? (In the context of LearningPathway enrollments.)
@Kyle McCormick Sample plugin ideas:
Feanil and I are giving a conference workshop on how to build multiple kinds of plugins into one unified omni-plugin https://github.com/openedx/sample-plugin . Would like to get some openedx-core “plugin” representation in there. We’re thinking,
Course card archival
“Reviewed by ___”
model: ReviewedStatus(TimeStamped)
PE
DraftChangeLogEntry
User
rest api for marking as reviewed
(new?) Sidebar slot
new Filter: EntityPrePublish (or model pre-save signal)
be careful with PublishLog
should it remove things from the publish list, or cancel the whole publish?
just abort it - removing things would be full of footguns
removing things would have to happen at an earlier layer in order to be safer
ambitious: PublishReviewedItems
get_entities_with_unpublished_drafts
no dependencies - just actually reviewed things
Mar 24, 2026
~4.5 weeks out from Verawood cutoff
Mar 17, 2026
@Kyle McCormick → Key Coherency for openedx-core v1.0
@Kyle McCormick I’d like to revisit Braden’s Version branching proposal one more time, and decide if there’s anything we’d like to do before minting v1.0.
Alternate proposal from @Braden MacDonald :
What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.
Example:
Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.
Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.
When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.
As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”
Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.
Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.
Mar 11, 2026
@Braden MacDonald - containers
Most (all?) settings we have today for containers are really course policies.
@Dave Ormsbee (Axim) maybe something to talk through today?