Open edX Core - Arch Sync Notes
Team: @Kyle McCormick @Dave Ormsbee (Axim) @Braden MacDonald
Epic: https://github.com/openedx/openedx-learning/issues/353
TOC:
- 1 Apr 14, 2026
- 2 Apr 7, 2026
- 3 Mar 31, 2026
- 4 Mar 24, 2026
- 5 Mar 17, 2026
- 6 Mar 11, 2026
- 7 Mar 6, 2026
- 8 Feb 10, 2026
- 9 Jan 20, 2026
- 10 Jan 13, 2026
- 11 Jan 6, 2026
- 12 Dec 16, 2025
- 13 Nov 25, 2025
- 14 Oct 21, 2025
- 15 Oct 7, 2025
- 16 Aug 19, 2025
- 17 Jul 29, 2025
- 18 Jul 22, 2025
- 19 Jun 17, 2025
- 20 Jun 10, 2025
- 20.1.1 Outline (45 min talk)
- 20.2 Outline Strawman:
- 21 Jun 3, 2025
- 21.1 May 15, 2025
- 22 2025-05-15
- 23 2025-05-06
- 24 2025-04-29
- 25 2025-04-02
- 26 2025-03-05
- 27 2025-02-05
- 28 2025-01-09
- 29 2024-12-18
- 30 Old Notes
- 30.1 Talk Proposal
- 30.1.1 Title
- 30.1.2 Description (<500 words)
- 30.1.3 Type
- 30.1.4 Target Audience
- 30.1.5 Proposal
- 30.1.6 Rough Talk Outline
- 30.1.7 Additional Notes
- 30.2 2024-11-20
- 30.3 2024-11-13
- 30.1 Talk Proposal
Apr 14, 2026
~1.5 weeks out from Verawood cutoff
@Dave Ormsbee (Axim) Proposed high level structure and abstractions for validation/restore flow:
(Zip Archive | Local Path | GitHub | etc.) → Storage Filesystem (fsspec)
Extract TOML from fs and compile into
UnvalidatedCompletePackageInputRationale: Allow a way to have input formats that are better tailored to specific use cases, e.g. editing entire sections in the same file.
Contents
Unvalidated Learning Package dict (simple Python types)
Includes all TOML content (entities, versions, containers, collections), but does not include media files like
block.xmlor static assets.This is meant to be a minimal transformation of the input TOML.
Structured to try to prevent certain types of logical errors, e.g. refs are used as keys in a dict, so there’s no way to represent duplicate definition of entities.
Errors
Mostly consistency errors, e.g. redefining the same entity multiple times.
Not the actual JSON Schema validation, but just errors when converting the source TOML into the compiled dict.
Resources (where to find media data for later)
UnvalidatedCompletePackageInput→CompletePackageInput+ errorsThis is done using Pydantic models.
Input and output models will be kept separate (output models are much stricter about requiring certain fields).
JSON Schema is generated from the input model.
Two levels of errors:
Ones that JSON Schema can handle, e.g. missing fields, regex not matching, wrong types, etc.
Deeper ones that JSON Schema can’t deal with, like referential integrity (pointers to versions that don’t exist, containers referencing children that don’t exist, etc.)
Missing resources
Strict mode?
CompletePackageInput→LearningPackage
Apr 7, 2026
~2.5 weeks out from Verawood cutoff
@Kyle McCormick Whiteboarding - “north star” architecture https://excalidraw.com/#room=9e33ec3e3ebf9175de2b,nAqtkyhx59SUYlEl9a-XiQ
@Kyle McCormick @Braden MacDonald - Confirm we want
LEARNING_PACKAGES_*events (https://github.com/openedx/openedx-core/issues/462#issuecomment-4193595258 )@Kyle McCormick @Dave Ormsbee (Axim) Met with MITx physics course author teams, whose use OLX heavily.
They have several repos, each holds 1 or more related courses
possible argument for learning packages holding multiple courses / course runs
Automated workflow: merge to master triggers the XML to import into their staging env
Last-minute tweaks may be made in studio, but are wiped out upon next course update from git. Ad hoc process for remembering to make fixes back in XML.
Each section is an XML file, holds structure down to unit or component level
toml does not support multiple levels in one file. seems like it’d be easy to support that, though?
Many units are authored in .tex files and converted in unit XML via latex2edx
does this imply that units should be able to hold assets?
Decision: For Verawood, just use pydantic to validate and document the current format. Worry about pluggability later, as we’re considering not even sticking with TOML (sqllite? more OLX?) long term.
@Braden MacDonald opened a PR to make type annotations for primary keys.
Mar 31, 2026
~3.5 weeks out from Verawood cutoff
@Dave Ormsbee (Axim) : Does enrollment get its own top level app in openedx-core? (In the context of LearningPathway enrollments.)
@Kyle McCormick Sample plugin ideas:
Feanil and I are giving a conference workshop on how to build multiple kinds of plugins into one unified omni-plugin https://github.com/openedx/sample-plugin . Would like to get some openedx-core “plugin” representation in there. We’re thinking,
Course card archival
“Reviewed by ___”
model: ReviewedStatus(TimeStamped)
PE
DraftChangeLogEntry
User
rest api for marking as reviewed
(new?) Sidebar slot
new Filter: EntityPrePublish (or model pre-save signal)
be careful with PublishLog
should it remove things from the publish list, or cancel the whole publish?
just abort it - removing things would be full of footguns
removing things would have to happen at an earlier layer in order to be safer
ambitious: PublishReviewedItems
get_entities_with_unpublished_drafts
no dependencies - just actually reviewed things
Mar 24, 2026
~4.5 weeks out from Verawood cutoff
Mar 17, 2026
@Kyle McCormick → Key Coherency for openedx-core v1.0
@Kyle McCormick I’d like to revisit Braden’s Version branching proposal one more time, and decide if there’s anything we’d like to do before minting v1.0.
Alternate proposal from @Braden MacDonald :
What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.
Example:
Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.
Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.
When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.
As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”
Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.
Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.
Mar 11, 2026
@Braden MacDonald - containers
Most (all?) settings we have today for containers are really course policies.
Decision: keep container APIs generic (so you can call either create_container(type=Unit) or create_unit() and it works correctly), keep the empty Unit/UnitVersion models for foreign keys only, assume no content-related settings for now, and include nice “wrapper” APIs for dealing with Units etc. that just call the generic ones.
@Kyle McCormick - opaque keys
Kyle to post a revised proposal on https://github.com/openedx/opaque-keys/issues/411
@Dave Ormsbee (Axim) - javier’s media assets proposal
wgu has a digital asset mgmt system
Assets as PEs that are dependencies of Components that they’re used in.
What do do with existing ComponentVersion media?
Mar 6, 2026
~5 weeks out from Verawood cutoff
CCX and the catalog models: Just confirming that every CCX is a CourseRun, rather then every CCX belonging to a CourseRun
we need to keep it as: every CCX is a CourseRun
CBE and Pathways
Unified pathway modelling: Dave’s WIP PR, and how we should proceed with unifying Pathways and CBE workstreams https://github.com/openedx/openedx-core/pull/480/
Where to put Pathways enrollment logic: Piotr has a proposal on where to put Pathways enrollment logic and models, I think he’s blocked on this -- https://github.com/openedx/openedx-core/pull/482#discussion_r2854844921
CBE assessment criteria versioning ADR: Are we good with versioning mastery criteria using SimpleHistory rather than PEs/PEVs? https://github.com/openedx/openedx-core/pull/476/files#diff-d6b4d238e328680c3c9025e60578a038247e5cc16d06e8e647383fae0dcccbce
Notes
Dave: Keep models in openedx-core, until they need to FK to openedx-platform, then put the models there. Keep logic in openedx-core, until it needs to call openedx-platform, then put it in openedx-platform and push data down to openedx-core.
Containers: Braden asks if we should continue having Unit and Sub/Section tables: https://github.com/openedx/openedx-core/issues/412#issuecomment-3988199747
Feb 10, 2026
Context table
context_key: LearningContextKeyField
Usage table
usage_key: UsageKeyField
context: FK(Context)
block_type: str
block_slug: str
Jan 20, 2026
Adding CourseRun to LC, in support of Learning Pathways
Why?
If we want to build pathways in the LC repo, those pathways' steps will need to FK to CourseRuns as part of their completion criteria
Caveats!
do they really mean CourseRun and not Course?
for ASU at least, they’ll have 1-1 mappings between Run<->Course, so it doesn’t matter to them
it won’t always be tied to CourseRun.
e.g. if you’ve already taken course X on instance A then we’ll give you credit for that on thing on instance B
we shoudn’t be so prescriptive about what completion requirements are – still good to have FKs in some way though
wouldn’t put CourseRun in authoring - gets its own top-level thing?
Course vs CourseRun has been a half-baked feature
Simplest thing
New catalog app
One model: CourseRun
push to it from CourseOverview
as in, whenever there’s an update to CO, push the information into CourseRun
Course model? probably not necessary, because today, there’s no data in it other than FKs, which we don’t think we need. If we do nee that in the future, we could backfill one pretty trivially
proliferation of courserun tables: CourseOverview, SplitModulestoreIndex, now CourseRun
but CourseRun is the really real one now
Capitalization
case sensitive in LC!
FK to organizations_organization table?
openedx-learning and the catalog app
openedx-core?
WOuld the same people maintain the authoring_api and this new catalog_api?
braden: we should have some core catalog models for things like organizations, courseruns, maybe courses. but keep them simple ,draw a fence around them, anything more bespoke should be done outside of the core
if openedx-learning becomes openedx-core, then we would want
openedx_core.api.authoring->openedx_core.api.catalog
openedx_core.api.modular_learningas a peer to authoringpathways
administrative bits
we want to innovate on T&L, but not on the catalog.
history: openedx-learning was the name back when we thought that this woudl be pushed to from Studio, rather than the authoring source models
Thinking about dev future
importing from
openedx_platform.,openedx_events,openedx_filters,xblockopenedx_api.
openedx-core
openedx_taggingopenedx_catalogopenedx_authoringopenedx_content?related questions: UserPartitions, do they go in here, or in a new thing?
openedx_keys← aspirational goal!
(Any other Learning Pathway topics?)
Challenges with type-annotating XBlock
Seems like usage_ids are always UsageKeys except in on place: MemoryIdManager uses strings. But MemoryIdManager is only used in XBlock tests (already fixed on a branch) and in the LearningCore-based runtime with the comment
# We don't really use id_generator until we need to support asides.Can we delete MemoryIdManager, so that usage_ids are always instances of UsageKey?
The type of
def_idis all over the place in edx-platform. Best I came up with is this:DefinitionId: t.TypeAlias = DefinitionKey | UsageKey | ObjectId | LocalId | str.Thoughts?
Note: my understanding is that in a fully LC-world, def_ids are redundant with usage_ids.
Jan 13, 2026
“ContainerType” (https://github.com/openedx/openedx-learning/issues/412)
edx-platform has developed the idea of each container having a single “ContainerType” – currently Unit, Subsection, or Section. https://github.com/openedx/edx-platform/blob/e6deac0cf12226c0b8d744ad17395373cfe0de03/openedx/core/djangoapps/content_libraries/api/container_metadata.py#L42
Do we want to actually support a Container having multiple “types”? E.g. can a Unit also be a ____ ?
If so, how should edx-platform change?
If not, can we codify the single-type restriction somehow?
Can we pull the idea of ContainerTypes into learning core?
Assumptions we could make:
Weaker: Every container has one type
In favor
Stronger: Every container is: Unit, Subsection, Section, (OutlineRoot)
Not in favor
Data model options
Put an actual field on the model for container_type?
we have this for components already. we also have the ComponentType model.
Somewhere we need to store the mapping of classes to OLX tags
How to register?
for components, it’s done thru xblock, and there’s a deterministic mapping between OLX tags and component_type names (
blah↔︎xblock.v1:blah)
Case study: “Assessment”
What data does the Assessment have?
Assessment
(student scores)
AssessmentVersion
proctoring / timing info
a PubEnt that is the assessment’s content
Or a Container?
3 options:
Assessment → Container
Container → Assessment
?
this got interesting – see recording
Implementation
Similar to ComponentType, create a ContainerType table
Add container_type field and make a migration to backpopulate
Update Django admin (optional but nice)
Rather than testing children stuff on every single concrete builtin container, register a fake container type and run all the core Container tests on that. This is necessary now since we are disallowing “naked” Containers
class FakeContainerSubtype(Container)← register it asfakecontainerAlso: https://github.com/openedx/openedx-learning/issues/308 , may need adjusting
Remove all the now-unnecessary logic in edx-platform’s modulestore_migrator and content_libraries apps now that container_type is better codified in Learning Core
including select_related queries, like this: https://github.com/openedx/edx-platform/blob/e6deac0cf12226c0b8d744ad17395373cfe0de03/cms/djangoapps/modulestore_migrator/api/read_api.py#L51-L55
Jan 6, 2026
Proposal: https://github.com/openedx/openedx-learning/pull/454
We already require release-to-release migration
Investigate squashed migrations for going across app and bootstrapping
authoring.subappsinstead ofapplets?
Taxonomies as PublishableEntities
Braden: Could we feasibly store it as a blob and not do the full versioning, or keep just the current draft/published versions to allow foreign keys to content?
Sam wants to implement a feature for bulk publishing--does our data model support this?
Dave thinks this is doable. Will look into making that query.
Dec 16, 2025
two layers of pluggability / two axes
granularity of content (course run as a node, unit as a node)
flexibility of completion criteria (all the things, some subset of things)
mary from unicon was talking about different kinds of criteria
v1: a fairly powerful datamodel with ands and ors on ndes
unicon proposal focuses on tags
tags map to competencies
can aggregate tags together
“you completed 3/5 things that represent some concept, which then rolls up into some greater concept”
we could bake everything on that notion
have competencies as the lowest common denominator type thing
it would be good to keep these axes independent of one another
we have to support CBEs, but it should also be agnostic
you should be able to say “complete these 6 courses” without involving competencies
“X is in the pathway”, without encoding how it X is part of the pathway’s completion
braden: what is CBE in this context?
dave: evaluate what you know rather than what you do
being able to assess what you understand
Nov 25, 2025
Kyle: Probably able to get FC money to push this along, but haven’t gotten that yet.
Dave: Is it worth considering alternatives to Pydantic for schema validation?
No.
Dave: Thought was that validation mechanism would be converting to JSON.
Multiple layers of validation--e.g. containers and components vs. more specific things that need plugins.
Plugin Examples:
Annotating items with notes/discussion. Could be limited to Studio (author notes).
PublishableEntity → messages
Frontend: Tab entry
Workflow
(see how far we can go with just tags)
Does this replace Asides?
AI Content Generation
Individual XBlocks
VideoBlock
ProblemBlock
Is it okay if we have field data that is redundant with OLX?
Braden: Okay, as long as source of truth is clear.
@Dave Ormsbee (Axim) to look at how to discourage direct model writes.
Braden: Let’s make sure there’s a solid justification for making this, i.e. we could just make a JSON blob of metadata for plugins to play with.
High risk areas / things that need a lot of deliberation:
https://github.com/openedx/openedx-learning/issues/435
Braden: Would be good to identify plugin/application use cases where they could use purely Learning Core as a library and do something useful without having to use edx-platform. e.g. analyze content and produce a report.
Kyle: For Verawood/LC 1.0, focus on edx-platform plugins, not necessary LC plugins.
Should we have OLX parsing for containers in LC?