Libraries/Learning Core Arch Sync Notes
@Kyle McCormick @Dave Ormsbee (Axim)
- 1 Dec 2, 2025
- 2 Nov 25, 2025
- 3 Oct 21, 2025
- 4 Oct 7, 2025
- 5 Aug 19, 2025
- 6 Jul 29, 2025
- 7 Jul 22, 2025
- 8 Jun 17, 2025
- 9 Jun 10, 2025
- 9.1.1 Outline (45 min talk)
- 9.2 Outline Strawman:
- 10 Jun 3, 2025
- 10.1 May 15, 2025
- 11 2025-05-15
- 12 2025-05-06
- 13 2025-04-29
- 14 2025-04-02
- 15 2025-03-05
- 16 2025-02-05
- 17 2025-01-09
- 18 2024-12-18
- 19 Old Notes
- 19.1 Talk Proposal
- 19.1.1 Title
- 19.1.2 Description (<500 words)
- 19.1.3 Type
- 19.1.4 Target Audience
- 19.1.5 Proposal
- 19.1.6 Rough Talk Outline
- 19.1.7 Additional Notes
- 19.2 2024-11-20
- 19.3 2024-11-13
- 19.1 Talk Proposal
Dynamic Content Brainstorming · Issue #344 · openedx/openedx-learning
Epic: Learning Core 1.0 · Issue #353 · openedx/openedx-learning
Dec 2, 2025
Nov 25, 2025
Kyle: Probably able to get FC money to push this along, but haven’t gotten that yet.
Dave: Is it worth considering alternatives to Pydantic for schema validation?
No.
Dave: Thought was that validation mechanism would be converting to JSON.
Multiple layers of validation--e.g. containers and components vs. more specific things that need plugins.
Plugin Examples:
Annotating items with notes/discussion. Could be limited to Studio (author notes).
PublishableEntity → messages
Frontend: Tab entry
Workflow
(see how far we can go with just tags)
Does this replace Asides?
AI Content Generation
Individual XBlocks
VideoBlock
ProblemBlock
Is it okay if we have field data that is redundant with OLX?
Braden: Okay, as long as source of truth is clear.
@Dave Ormsbee (Axim) to look at how to discourage direct model writes.
Braden: Let’s make sure there’s a solid justification for making this, i.e. we could just make a JSON blob of metadata for plugins to play with.
High risk areas / things that need a lot of deliberation:
Pluggable Serialization for Learning Core backup/restore · Issue #425 · openedx/openedx-learning
API concept mapping between edx-platform and LC · Issue #435 · openedx/openedx-learning Braden: Would be good to identify plugin/application use cases where they could use purely Learning Core as a library and do something useful without having to use edx-platform. e.g. analyze content and produce a report.
Kyle: For Verawood/LC 1.0, focus on edx-platform plugins, not necessary LC plugins.
Opaque Key Simplification? · Issue #411 · openedx/opaque-keys
Should we have OLX parsing for containers in LC?
Braden: Are we going to keep components separate from XBlock?
Kyle: XBlock as a layer over Component, but move as much of the common stuff into Component.
Leave XBlock as an escape hatch
Would be nice if the core “Component” API can handle basic things like display_name, scores (max_score, is_scorable), (tags?), even parsing them from the OLX if needed but never invoking/requiring xblock plugins (for specific xblock types). This should be doable since the outer OLX tag is very consistent with how it handles these mixin attributes, even though the inner content can vary wildly among different xblock types.
Oct 21, 2025
python_lib.zip in v2 libs
ideas
ability to load a python_lib.zip in a component in the Advanced section
conclusion: this would cause pain for import/export/backup/restore. Too hacky to support long term, don’t do this.
what is a code library?
a component? no, we want it to belong to other components (problems)
an asset? no, we’d like versioning
maybe something that works with zip archives, which we can store cheaply. (want to avoid the high overhead of having many versioned files given how we store version-name mappings)
backport?
yes
Important note: v1 libraries don’t copy python_lib.zip either. We might be able to get away with creating a new Resource type that works how we want within libraries, but relies on authors to upload the python_lib.zip file to their course. (Or maybe we let them select one python_lib.zip from their course to sync over?) It would let us side-step the “every library problem could have a different version of a python_lib.zip file to store” issue.
Oct 7, 2025
Reviewed and updated
Epic: Learning Core 1.0 · Issue #353 · openedx/openedx-learning
Aug 19, 2025
Efficient Container Hierarchy Traversal · Issue #360 · openedx/openedx-learning
Braden: general agreement on table structure and
Discussion of the justification for outline root being special (fixed location, expectations, ordering, simple select of everything in a course). Breaks assumption that parent entity is n+1.
Special, nullable root field.
Why component vs. entity?
tighter focus
ability to
But higher levels, could be containers/container versions instead. more correct, db can verify
Schema: special fields for the leaf node and the root node, container references for intermediate nodes
(will need to join against xblock field data for inheritance calculation)
Dynamic children
Braden: If we have randomized content, where does it get represented?
Kyle: Where I’ve been leaning lately is that these Selectors exist outside the hierarchy and have metadata that affects how you’d turn an authored hierarchy into a user’s hierarchy.
Braden’s concerns: Are we going to have randomized content blocks with 10K+ children? Will it break this? If people naively look at this API, they might see these children and mistake them. How do we keep API users from misinterpreting this.
Can we make pointers to DAG’d problems, something lightweight?
Followup: talk to product/UX about DAGs in courses.
Jul 29, 2025
Dynamic Children
Do we try to unify the user partition functionality with dynamic child selection, or keep them separate?
Do we build a separate model for more efficiently storing hierarchy for rapid traversal?
(We really didn’t take notes well this session.)
Jul 22, 2025
CCX in learning core
Decision: Move CCX to openedx/ so it can be shared by LMS and CMS. But make sure that LMS can’t write to modulestore via the CCX wrapper.
Implications for permissions requirements?
Future
CCXs are PEs within a LearningPackage, the same LearningPackage of the CourseRun that they are based on
Pearson (along with others) want to have a more flexible level of customization.
MIT’s CCX use case is more restrictive, intentional limitations of customization to scheduling and hiding particular content.
It’s possible that we can implement the more flexible use case while having a list of customizable things, and then the MIT use case can be covered by disabling certain allowed customizations..
Retro: Lessons Learned from the Prototype
Learning Core → ModuleStore Shim
It looks workable.
We should store Course Usage Keys separately from the
PublishableEntity. I first thought this was a compromise, but I’ve come around to the idea that it makes perfect sense, since those usage keys are very much an XBlock runtime concern, and the XBlock runtime app can control that mapping and the constraints on it.The definition doc envelopes are easy enough to generate on the fly.
We do need the PublishLog and proper side-effect tracking in order to provide a real value for
subtree_edited_on, because that’s used for caching and other comparisons.There are structure doc fields that aren’t necessary to fully preserve (e.g. what version initially created this thing), and would only be used for historical comparisons to other structure docs that won’t exist.
We do need to create branch awareness for preview purposes (this is more a reminder to myself).
We can get into a weird state if we let Modulestore try to edit courses that are being shimmed because the structure writes are thrown away.
We should check our structure caching with CCX to make sure it’ll work correctly (it’s a bit broken in split today).
Whatever we do for dynamic children needs to be able to compile out into parent-child relationships for the purposes of the shim.
Maybe we hide the supporting/weird blocks? Some of these look like they’re just bugs, so it’s probably worth figuring out what’s going on and maybe remove their usage in the course editing code.
Import of course data
We should batch search indexing (on the DraftChangeSets?)
Search indexing in general needs a bunch of improvements right now
While using migrator, it’s possible to get into a corrupt data state that is irrecoverable. Example: having a section and adding its container versions without adding related section versions. May be other places where we have app level constraints that aren’t reflected by database constraints.
Many ways for data to become inconsistent
Want clean separation on platform vs. learning core concerns. Not always clear when to put stuff in one place or another.
Django Admin is super-helpful when you build it out.
Modulestore Migrator is a mix of the libraries API and the XBlock API and the Learning Core APIs, and it needs to currently use all the libraries API to make sure we don’t miss upstream library stuff (like indexing). But should make it safe to use the authoring API directly so it bubbles up.
Events need to get into Learning Core. Hard to make sure it’s consistent otherwise.
What is a plugin that you could test with just Learning Core?
Discussions, where you need references to content and configuration that’s in content, but it has its own data and views.
ProblemBlock if there were a minimal xblock runtime
ORA2, sophisticated models
VideoBlock, being able to add VAL-like data
Would be great to have a minimal XBlock runtime that is used by edx-platform as well as xblock-sdk envs.
We don’t have a good story for Asides support.
Formally Deprecate XBlock after we figure out a good plan for dynamic children.
Need to measure data accumulation / pruning needs.
Are there other big unknowns we need to figure out aside from dynamic children?
userpartitions (part of selectors effort?)
Catalog course? (Not in authoring, anyway)
Mostly have static assets figured out?
Other, not-really-XBlock things:
Grading Policy
Scheduling?
Re-runs. e.g., whether we redo the versioning data model to allow for re-runs to reuse more componentversions. Or whether we add something lighter-weight. “branches”?
Keys, uuids, pks
Is there a faster path to support courses in Learning Core keeping the existing Studio UI exactly as-is?
Would it be worth it?
Jun 17, 2025
Kyle:
Migration tool and Django admin for outline roots
Braden
Mostly been finishing up other projects behind schedule
Will work on slide
Dave
Pushing basic structures to LC->Split shim works, but defs still coming from MongoDB right now.
Jun 10, 2025
Dave
Got Piotr started on the rendered JSON
Hack for Learning Core course key mapping: run starts with “LC”
Kyle
Import API:
feat!: modulestore_migrator by kdmccormick · Pull Request #36873 · openedx/edx-platform focusing on this over dynamic content
For integration:
Prototype models for Courses and Outline Roots in Learning Core by bradenmacdonald · Pull Request #316 · openedx/openedx-learning
What we submitted: https://sessionize.com/app/speaker/session/799693
Original Text of submission
The new Libraries experience introduced in Sumac stores content using Learning Core–a new, more efficient, and more extensible successor to the MongoDB-backed ModuleStore backend currently used for courses and legacy libraries. Learning Core offers tremendous benefits to operators and developers alike, but we must migrate our course content in order to fully realize those benefits.
We will explain these benefits in detail, propose a migration process, and explore the longer term implications.
Primarily, we want to communicate the benefits for site operators who undertake this migration. Secondarily, we want to touch on some nuances of the migration that developers may be interested in, providing them the knowledge and resources to learn more outside of the talk.
Short-term benefits (pre migration of courses) include a stable plugin API for library content authoring. We hope to include a reference plugin that enhances the library authoring experience in some way– for example, a plugin that displays version history for all library components. We would like to explain how this new plugin API dovetails with other existing Open edX extension points like Events, Filters, Slots, and XBlock.
Medium-term benefits (immediately post migration of courses) include: removal of MongoDB as platform dependency, a stable plugin API for learning components and assets, better Files & Uploads experience for course authors including versioning and searching, better content inspection and querying for administrators, more efficient serving of assets, reduced storage needs, and reduced memory overhead.
Long-term benefits include: stable plugin APIs for authoring units and sequences, user partitioning, and other learner-content interactions; ability to offer enrollable content outside of the traditional 3-level Open edX course hierarchy; massively simplified edx-platform maintenance; and better unit test data for edx-platform developers and plugin developers.
We also want to discuss backwards compatibility. We expect to be fully backwards-compatible with all Studio content and most-if-not-all OLX-authored content, with some caveats where compatibility is at odds with content security. For Open edX plugins which access ModuleStore today, we expect some of them to continue working, and others to break; we will go into more detail on the distinction between those two categories.
Outline (45 min talk)
Outline Brainstorming
Kyle: What would it be like to run LMS without MongoDB?
(Demo)
Here’s why you can’t do that in Teak
Talk about migration.
Braden: A lot of folks have ModuleStore understanding, vaguely know LC. Main point would be migration process, timeline
Kyle: Why is this additive, not just subtractive. Powerful plugin API.
Braden: Example of properly integrating video information into libraries and not the mess we have with VAL today.
Kyle: Would be nice to add some data to content just to show we can do it.
Call to action?
Kyle: Upgrade
Braden: Once it’s stable enough, would love to have a Learning Core course that’s editable from Libraries and a part of the dev experience.
Dave: anything they can do to prepare their content for this?
Selling it
removing mongodb obvi
cost savings?
ztraboo’s post on gridfs - good case study on how much s3 would save operators
one less piece of infra
show off the improved data model?
can we spruce the admin interface up more? → @Kyle McCormick
libraries UI lets you see raw olx
extensibility in the future
Much easier to show/access history
Have one xblock that extends the model
talk about shimming and how that’ll smooth transition
Outline Strawman:
(Dave: A starting point for conversations about our talk outline. I don’t feel like it flows together very well at the moment, but let’s talk about it in the next session.)
What would mean to run LMS without MongoDB?
What’s in our way?
Motivation
Cost reduction.
Platform simplification.
Transactions (good and bad (celery)).
Granular, extensible data model.
Concrete example here would be great, e.g. video or problemblock information, contrast with VAL.
We can show screenshots of existing Django admin functionality
How will this work?
Goals: Transition quickly while preserving backwards compatibility.
The LMS ModuleStore read-shim/compilation step.
Porting Studio to be able to write Courses in Learning Core.
Files and Uploads and where they’re stored.
Gradual porting of other systems to bypass ModuleStore, e.g. grades, course blocks API, outlines, CSM.
What changes will there be to the authoring experience?
Course-centric editing is not going away, though it may not be exactly the same as the current course experience.
We want to make it easier to bridge different levels/types of content, e.g. small courses,
Timeline? Call to action?
Other ideas:
Break this up by target persona? Students aren’t intended to notice any difference during the transition, but we can separately map out Course Authors, Developers, and Ops folks?
Jun 3, 2025
May 15, 2025
Talking git-ification:
We could do add a
version_numin a join table with the PE if necessary. Also depends on whether we want to restart history on re-run or not.Cleanup gets harder because no direct fkey to
PublishableEntityWon’t happen automatically, but as long as there’s a join table for entity version ↔︎ entities. So look for versions that aren’t referenced (have cascade deletion for the entity)
Where should the CatalogCourse and Course live? Separate provisioning repo? Separate package? Want to be able to provision it before content exists potentially.
Kyle: openedx-learning should stop short of any catalog understanding, separation content from how people find and get access to content.
Dave went over Explicitly modeling publishing dependencies · Issue #317 · openedx/openedx-learning
2025-05-15
@Braden MacDonald opened a PR for OutlineRoot
Why did we want to put CourseRun in edx-platform?
Dave: dependencies we’d need to pull into learning core to have CourseRun there…
cohorts
grading policy
etc
Dave: could see more things moving into learning core at a time
Braden: What about a core CourseRun model in learning core, which edx-platform and other plugins could hang things off of (e.g. days_early_for_beta)
Braden: Learning Packages w.r.t. re-runs
In theory, a LP could have several re-runs
Hypothetically… we have a LP with a course run
And we copy the outline root
But we probably want to share the sections, subsections, etc
But if I want to edit some content in the re-run, then how does that not modify the original course?
Do we need to deepcopy the entire outline immediately upon rerun?
Dave: An LP makes sense when it’s the same set of authors
Kyle: thoughts about the outline
Option 1: Do a re-run, copy the whole outline immediately.
Braden: Right now each component has one pointer to a current Draft and a current Published. Could store for each branch, each representing a different run (draft_run1 pointer, publish_run1 pointer; draft_run2 pointer, published_run2 pointer).
Kyle: Should there be a fkey to PE that is branch.
instead of version_num being unique, (version_num, branch) is unique
this makes cloning/reruns very cheap
or could hopskotch and mix the history, requires more changes
Question: Do we want to have a separate namespace per course-run? This is not the case if we do branch encoding in the version.
Braden & Dave: Would be nice to have a working prototype where we can cobble this stuff together:
Kyle: should have a sandbox
Later:
Alternate proposal from @Braden MacDonald :
What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.
Example:
Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.
Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.
When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.
As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”
Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.
Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.
2025-05-06
Dave: update
implicit compile step for representing components and saving xblock fields
LC runtime doesn’t support parents and children
tie these realized fields to PEVersion rather than ComponentVersion. That way, anything with field data can use it
split modulestore does allow another “backend” than mongo, dave is trying that
one row per course per published verison of structure document
when we publish again, row gets replaced
(rather than pruning old structure docs)
do we need a draft version of the structure doc?
yes, for preview, but not necessarily for POC
Kyle: Data modelling
class Selector(PubEnt)
class SelectorVersion(PubEntVer)
partition: Partition
variants (reverse relation from Variant.selector_version)
class SplitTestVersion(SelectorVersion)
class ItemBankVersion(SelectorVersion)
count: int
class GradeGateVersion(SelectorVersion)
a hypothetical custom selector
selects a child based on the user’s current overall grade
class Variant(Model) [ alt name: class Selection(Model) ? ]
entity_list: EntityList
selector_version: SelectorVersion
group: Group|null
variant is valid iff all non-null of [selector_version, group] both match the query
otherwise, need to re-invoke selection process to determine Variant
old matching Variant factors in when determining a new Variant
new Variant may need to be generated if it does not exist
Partition
Group
2025-04-29
What’s the most minimal MVP we can do for getting course content into learning core
Learning core backend for SplitModuleStore
class ComponentVersionXBlockData(Model): cv = OneToOneField(ComponentVersion) content_fields = JSONField() settings_fields = JSONField()Generating this would need us to instantiate the xblock runtime
Up front as a part of migration?
Braden: Would this be a temp thing?
Dave: Field data split is long term thing.
Braden: How do we handle containers?
Kyle: Simple/dumb data hanging off the containers
Braden: Switch on a per-course basis?
Yes, course waffle flags maybe?
But it would be bad to do this on a per-user basis lol
Braden: Is the split shim readonly or read/write?