Open edX Core - Arch Sync Notes
Team: @Kyle McCormick @Dave Ormsbee (Axim) @Braden MacDonald
Epic: https://github.com/openedx/openedx-learning/issues/353
TOC:
- 1 May 5, 2026
- 2 Apr 28, 2026
- 3 Apr 21, 2026
- 4 Apr 14, 2026
- 5 Apr 7, 2026
- 6 Mar 31, 2026
- 7 Mar 24, 2026
- 8 Mar 17, 2026
- 9 Mar 11, 2026
- 10 Mar 6, 2026
- 11 Feb 10, 2026
- 12 Jan 20, 2026
- 13 Jan 13, 2026
- 14 Jan 6, 2026
- 15 Dec 16, 2025
- 16 Nov 25, 2025
- 17 Oct 21, 2025
- 18 Oct 7, 2025
- 19 Aug 19, 2025
- 20 Jul 29, 2025
- 21 Jul 22, 2025
- 22 Jun 17, 2025
- 23 Jun 10, 2025
- 23.1.1 Outline (45 min talk)
- 23.2 Outline Strawman:
- 24 Jun 3, 2025
- 24.1 May 15, 2025
- 25 2025-05-15
- 26 2025-05-06
- 27 2025-04-29
- 28 2025-04-02
- 29 2025-03-05
- 30 2025-02-05
- 31 2025-01-09
- 32 2024-12-18
- 33 Old Notes
- 33.1 Talk Proposal
- 33.1.1 Title
- 33.1.2 Description (<500 words)
- 33.1.3 Type
- 33.1.4 Target Audience
- 33.1.5 Proposal
- 33.1.6 Rough Talk Outline
- 33.1.7 Additional Notes
- 33.2 2024-11-20
- 33.3 2024-11-13
- 33.1 Talk Proposal
May 5, 2026
Next steps between now and verawood.1
[Dave] Course import into a library is painfully slow. The current pattern is to add and publish components one by one, meaning that even if events are processed on a PublishLog basis, we’re doing hundreds to thousands of them per import. Can we draft all the changes first and then publish all at once?
[Braden] PR:
fix: update modulestore migrator to not publish in draft context by bradenmacdonald · Pull Request #38508 · openedx/openedx-platform Does that speed it up?[Dave] It speeds up the actual database part of it, but it’s the meilisearch re-indexing that’s glacial. It looks like each block gets its own task. I guess ideally, we’d want to fire off the search updates in bulk based on the DraftChangeLog/PublishLog events.
[Braden] Thought that the modulestore side was slower than the meili side
[Dave] Not sure
[Dave] Also, this reminds me that we should have a
.draftsproperty onDraftChangeLogso we can do something likepublish_from_drafts(lp.id, change_log.drafts). “Publish the DraftChangeLog I just made” has to be a really common use case. Maybe its own function entirely.[Dave] This is the top priority bugfix
[Braden] Will take a look as priority
[Braden]
Pasting a container into a library does not use a bulk changes properly · Issue #38507 · openedx/openedx-platform - fix for Verawood?We should do a targeted fix for this
[Kyle] PRs that didn’t make it:
fix: don't allow publishing within a draft change log context by bradenmacdonald · Pull Request #580 · openedx/openedx-core (“breaking change” but honestly it’s a bugfix)Try to fix this after modulestore_migrator is resolved
Backport it, time permitting
master PR is all ready to go
Braden will do this one
feat!: Consistently attribute changes with time and author by kdmccormick · Pull Request #573 · openedx/openedx-core (breaking change, although it guards against several buggy ways to call the API)Not this one - punt to Willow
Good for DevX and for avoiding buggy usage of the API, but too breaking to backport right now
feat: Allow IDs or Models to be passed to all publishing APIs by kdmccormick · Pull Request #564 · openedx/openedx-core (non-breaking change, just nice to have)Not this one - punt to Willow
Very nice both for DevX and for performance but not critical
Options for each one:
Backport into 1.x branch and install that branch into verawood
Put it aside for now, come back to it after the conference for Willow
[Dave] Generally would should be taking a look at performance
What’s the release date?
Jun 23, 2026
Peformance and docs stuff will mostly land post-conference
[Dave] Do we want to cut a 1.x branch? I’m concerned about the unintended breakage that can ripple out to platform, particularly if we do anything that might require data migration. Even going from 0.47 to 0.48 to 1.0 caused unexpected problems.
[Kyle] I think this is a great idea
[Kyle] Created
verawood-backports, to be versioned 1.0.2, 1.0.3, etc.mainwill start with1.1.0/2.0.0and continue from there
[Dave] Can we use some of Braden’s remaining hours to explore Courses-on-Core work?
[Kyle] Yes 100%
Braden will use remaining time on this
Kyle/Dave will talk about resourcing for Courses-In-Core
[Dave] I’ve been doing some docs work with Claude in the background
So nice
rst formatting in docstrings
Currently in draft, big
Docstrings need updates - will do separate
Will do a similar one on the API side later
timeline
Kyle mostly focused on conference for next two weeks
Braden away during conference and until June 9th ish
Kyle away after conference until June 9th ish, then will be focused on docs and performance
Apr 28, 2026
Release week, part 2
Verawood Need to haves
Verawood Nice to haves
[Kyle] Thoughts on approach? Rush this in, or too ambitious?
Do it
Example of missing user ID in content libraries even though the API accepts it:
openedx-platform/openedx/core/djangoapps/content_libraries/api/containers.py at e634f00be0aaf55965b0b412a76e4b9a5c342f96 · openedx/openedx-platform Weird case with collections - adding takes user ID but removing doesn’t: https://github.com/openedx/openedx-platform/blob/e634f00be0aaf55965b0b412a76e4b9a5c342f96/openedx/core/djangoapps/content_libraries/api/collections.py#L162-L174
[Kyle] I think this will be easy to land. But, arguably lower priority because it’s non-breaking.
Punting for now
[Braden] Will aim to have PR today. Should be low risk, no API changes.
Braden or Kyle will review when ready
Next few weeks:
Example plugin
Docs
Timestamps and author
Proposal
publish_all_drafts cannot happen in context manager (Braden)
Non-versionsed APIs (create_pub_ent, etc.) can be called outside context manager, and then they require
changed_at=andchanged_by=with draft_changes_for(learning_package_id=blah, changed_by=blah, changed_at=blah|None)Non-versionsed APIs (create_pub_ent, etc.) are fine to call in this context manager, and then they don’t allow
changed_at=andchanged_by=May loosen in the future if we want to do restore-with-attribution or something like that
Versioned APIs (create_pub_ent_ver, etc.) must be called in the context manager
They don’t accept
changed_at=andchanged_by=, they just use it from the context
Apr 21, 2026
Release week
Epic:
Epic: Open edX Core v1.0 · Issue #353 · openedx/openedx-core @Kyle McCormick List of PRs to land before the release
Need-to-have (blocks release)
Kyle
Braden
early feedback helpful
Tag and Release v1.0 (by Verawood cutoff) · Issue #434 · openedx/openedx-core Kyle
Nice-to-have (doesn’t block)
Braden/Kyle?
feat: New `history log` api functions [FC-0123] by ChrisChV · Pull Request #501 · openedx/openedx-core - Jenna flagged the history log stuff as nice-to-have but not release blocking.Dave
Include type prefix in container entity_refs
Dave: New containers get new refs, but don’t migrate the old data
We can do this, and then verify that backups from Verawood will still restore into Ulmo
Kyle?
Punt
Make (container_type,container_code) unique in LP rather than just (container_code,)
Can defer indefinitely
Want to talk about
[WIP] Re-implementing Restore to use Pydantic by ormsbee · Pull Request #554 · openedx/openedx-core
Learning package delete event?
there’s a library delete event, but you can delete a library w/o deleting the learning package
restore package makes a bunch of learning package which are “orphaned” until hooked up to a library. don’t think we ever wrote code to delete the ones that don’t get promoted into being libraries
we definitely want to be able to delete learning packages
yes to LEARNING_PACKAGE_DELETED event
does that ^ trigger COLLECTION_DELETED and ENTITY_DELETED ? ENTITY_REMOVED_FROM_COLLECTION ?
Dave: we should probably just have LEARNING_PACKAGE_DELETED, not triggering the others
Braden: we’d need to figure out which entities and collections to delete from meili based on LEARNING_PACKAGE_DELETED
Kyle: does COLLECTION_DELETED trigger ENTITY_REMOVED_FROM_COLLECTION ?
Braden: COLLECTION_UPDATED handles all of these
should be fine to just raise LEARNING_PACKAGE_DELETED , and not individual ENTITY_DELETED events, since it’s not undoable, and handling it just means wiping out all associated data
Braden will need to figure out relationship between LEARNING_PACKAGE_DELETED and CONTENT_LIBRARY_DELETED
[WIP] Re-implementing Restore to use Pydantic by ormsbee · Pull Request #554 · openedx/openedx-core This PR is just restore, not backup
Does not break backup_restore python API used by platform
New set of tests
Currently
extract from archive ← schema exists.
validation ← just using ootb pydantic validation. error reporting not working.
load into learning packages ←
Reviewable EOD tomorrow?
Current philosophy is “don’t be permissive”
but it’s more permissive in that you can be missing data/fields, or you can add extra fields
it is not permissive in that if you have a broken entity import, it blows up the whole import
i.e. no partially successful import
would be nice to have a list of what went wrong
regardless, error messages will be linked back to source toml file
Is this a public API?
no
Only difference is that if Willow+ archives add fields, this would be more tolerant
does it allow multiple entities in one toml file?
no
would be reluctant to without declaring a v2
Apr 14, 2026
~1.5 weeks out from Verawood cutoff
@Dave Ormsbee (Axim) Proposed high level structure and abstractions for validation/restore flow:
(Zip Archive | Local Path | GitHub | etc.) → Storage Filesystem (fsspec)
Extract TOML from fs and compile into
UnvalidatedCompletePackageInputRationale: Allow a way to have input formats that are better tailored to specific use cases, e.g. editing entire sections in the same file.
Contents
Unvalidated Learning Package dict (simple Python types)
Includes all TOML content (entities, versions, containers, collections), but does not include media files like
block.xmlor static assets.This is meant to be a minimal transformation of the input TOML.
Structured to try to prevent certain types of logical errors, e.g. refs are used as keys in a dict, so there’s no way to represent duplicate definition of entities.
Errors
Mostly consistency errors, e.g. redefining the same entity multiple times.
Not the actual JSON Schema validation, but just errors when converting the source TOML into the compiled dict.
Resources (where to find media data for later)
UnvalidatedCompletePackageInput→CompletePackageInput+ errorsThis is done using Pydantic models.
Input and output models will be kept separate (output models are much stricter about requiring certain fields).
JSON Schema is generated from the input model.
Two levels of errors:
Ones that JSON Schema can handle, e.g. missing fields, regex not matching, wrong types, etc.
Deeper ones that JSON Schema can’t deal with, like referential integrity (pointers to versions that don’t exist, containers referencing children that don’t exist, etc.)
Missing resources
Strict mode?
CompletePackageInput→LearningPackageJesper: Having documentation of the end-to-end restore pipeline will be good
@Jesper Hodge Did I get that right ^ ?
Braden: We have many things: tar.gzs, xml, zips, json, we want to have a sqlite format…
Data researchers would probably want to have an ability to import openedx archives with a specific library - do we want this as a separate opendx_data library ? Probably not necessary…
Kyle: could this be modified to be update instead of create?
Dave: trickiest thing is version numbering
imagine that the archive specified v2 of a compnent, but you also have a v2
Jesper: when importing a version, it should always become the highest (newest) version
Kyle: concerned about the idea of having separate formats for full restore vs. partial import
Braden: stagedcontent
should this new format be used to represent stagedcontent
would be great if we could represent everything in a library as a file, just like we can with OLX today, enabling things like copy-paste and drag-and-drop UIs
Apr 7, 2026
~2.5 weeks out from Verawood cutoff
@Kyle McCormick Whiteboarding - “north star” architecture https://excalidraw.com/#room=9e33ec3e3ebf9175de2b,nAqtkyhx59SUYlEl9a-XiQ
@Kyle McCormick @Braden MacDonald - Confirm we want
LEARNING_PACKAGES_*events (
Fire events from `openedx_content` so that downstream effects can be handled in platform · Issue #462 · openedx/openedx-core ) @Kyle McCormick @Dave Ormsbee (Axim) Met with MITx physics course author teams, whose use OLX heavily.
They have several repos, each holds 1 or more related courses
possible argument for learning packages holding multiple courses / course runs
Automated workflow: merge to master triggers the XML to import into their staging env
Last-minute tweaks may be made in studio, but are wiped out upon next course update from git. Ad hoc process for remembering to make fixes back in XML.
Each section is an XML file, holds structure down to unit or component level
toml does not support multiple levels in one file. seems like it’d be easy to support that, though?
Many units are authored in .tex files and converted in unit XML via latex2edx
does this imply that units should be able to hold assets?
Decision: For Verawood, just use pydantic to validate and document the current format. Worry about pluggability later, as we’re considering not even sticking with TOML (sqllite? more OLX?) long term.
@Braden MacDonald opened a PR to make type annotations for primary keys.
Mar 31, 2026
~3.5 weeks out from Verawood cutoff
@Dave Ormsbee (Axim) : Does enrollment get its own top level app in openedx-core? (In the context of LearningPathway enrollments.)
@Kyle McCormick Sample plugin ideas:
Feanil and I are giving a conference workshop on how to build multiple kinds of plugins into one unified omni-plugin
GitHub - openedx/sample-plugin . Would like to get some openedx-core “plugin” representation in there. We’re thinking,Course card archival
“Reviewed by ___”
model: ReviewedStatus(TimeStamped)
PE
DraftChangeLogEntry
User
rest api for marking as reviewed
(new?) Sidebar slot
new Filter: EntityPrePublish (or model pre-save signal)
be careful with PublishLog
should it remove things from the publish list, or cancel the whole publish?
just abort it - removing things would be full of footguns
removing things would have to happen at an earlier layer in order to be safer
ambitious: PublishReviewedItems
get_entities_with_unpublished_drafts
no dependencies - just actually reviewed things
Mar 24, 2026
~4.5 weeks out from Verawood cutoff
Mar 17, 2026
@Kyle McCormick → Key Coherency for openedx-core v1.0
@Kyle McCormick I’d like to revisit Braden’s Version branching proposal one more time, and decide if there’s anything we’d like to do before minting v1.0.
Alternate proposal from @Braden MacDonald :
What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.
Example:
Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.
Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.
When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.
As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”
Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.
Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.
Mar 11, 2026
@Braden MacDonald - containers
Most (all?) settings we have today for containers are really course policies.
Decision: keep container APIs generic (so you can call either create_container(type=Unit) or create_unit() and it works correctly), keep the empty Unit/UnitVersion models for foreign keys only, assume no content-related settings for now, and include nice “wrapper” APIs for dealing with Units etc. that just call the generic ones.
@Kyle McCormick - opaque keys
Kyle to post a revised proposal on
Opaque Key Simplification · Issue #411 · openedx/opaque-keys
@Dave Ormsbee (Axim) - javier’s media assets proposal
wgu has a digital asset mgmt system
Assets as PEs that are dependencies of Components that they’re used in.
What do do with existing ComponentVersion media?
Mar 6, 2026
~5 weeks out from Verawood cutoff
CCX and the catalog models: Just confirming that every CCX is a CourseRun, rather then every CCX belonging to a CourseRun
we need to keep it as: every CCX is a CourseRun
CBE and Pathways
Unified pathway modelling: Dave’s WIP PR, and how we should proceed with unifying Pathways and CBE workstreams
feat: very WIP stab at unified pathways modeling by ormsbee · Pull Request #480 · openedx/openedx-core Where to put Pathways enrollment logic: Piotr has a proposal on where to put Pathways enrollment logic and models, I think he’s blocked on this --
(WIP) feat: implement Pathways by Agrendalath · Pull Request #482 · openedx/openedx-core CBE assessment criteria versioning ADR: Are we good with versioning mastery criteria using SimpleHistory rather than PEs/PEVs?
Cbe assessment criteria versioning adr by mgwozdz-unicon · Pull Request #476 · openedx/openedx-core Notes
Dave: Keep models in openedx-core, until they need to FK to openedx-platform, then put the models there. Keep logic in openedx-core, until it needs to call openedx-platform, then put it in openedx-platform and push data down to openedx-core.
Containers: Braden asks if we should continue having Unit and Sub/Section tables:
Move ContainerTypes into openedx-core and simplify platform · Issue #412 · openedx/openedx-core
Feb 10, 2026
Context table
@Dave Ormsbee (Axim) maybe something to talk through today?