Libraries/Learning Core Arch Sync Notes

Libraries/Learning Core Arch Sync Notes

@Kyle McCormick @Dave Ormsbee (Axim)

 

Dec 2, 2025

Nov 25, 2025

  • Kyle: Probably able to get FC money to push this along, but haven’t gotten that yet.

  • Dave: Is it worth considering alternatives to Pydantic for schema validation?

    • No.

  • Dave: Thought was that validation mechanism would be converting to JSON.

  • Multiple layers of validation--e.g. containers and components vs. more specific things that need plugins.

  • Plugin Examples:

    • Annotating items with notes/discussion. Could be limited to Studio (author notes).

      • PublishableEntity → messages

      • Frontend: Tab entry

    • Workflow

      • (see how far we can go with just tags)

    • Does this replace Asides?

    • AI Content Generation

    • Individual XBlocks

      • VideoBlock

      • ProblemBlock

  • Is it okay if we have field data that is redundant with OLX?

    • Braden: Okay, as long as source of truth is clear.

  • @Dave Ormsbee (Axim) to look at how to discourage direct model writes.

  • Braden: Let’s make sure there’s a solid justification for making this, i.e. we could just make a JSON blob of metadata for plugins to play with.

  • High risk areas / things that need a lot of deliberation:

  • Should we have OLX parsing for containers in LC?

  • Braden: Are we going to keep components separate from XBlock?

    • Kyle: XBlock as a layer over Component, but move as much of the common stuff into Component.

      • Leave XBlock as an escape hatch

    • Would be nice if the core “Component” API can handle basic things like display_name, scores (max_score, is_scorable), (tags?), even parsing them from the OLX if needed but never invoking/requiring xblock plugins (for specific xblock types). This should be doable since the outer OLX tag is very consistent with how it handles these mixin attributes, even though the inner content can vary wildly among different xblock types.

Oct 21, 2025

  • python_lib.zip in v2 libs

    • ideas

      • ability to load a python_lib.zip in a component in the Advanced section

      • conclusion: this would cause pain for import/export/backup/restore. Too hacky to support long term, don’t do this.

    • what is a code library?

      • a component? no, we want it to belong to other components (problems)

      • an asset? no, we’d like versioning

      • maybe something that works with zip archives, which we can store cheaply. (want to avoid the high overhead of having many versioned files given how we store version-name mappings)

    • backport?

      • yes

    • Important note: v1 libraries don’t copy python_lib.zip either. We might be able to get away with creating a new Resource type that works how we want within libraries, but relies on authors to upload the python_lib.zip file to their course. (Or maybe we let them select one python_lib.zip from their course to sync over?) It would let us side-step the “every library problem could have a different version of a python_lib.zip file to store” issue.

Oct 7, 2025

Aug 19, 2025

  • Efficient Container Hierarchy Traversal · Issue #360 · openedx/openedx-learning

    • Braden: general agreement on table structure and

    • Discussion of the justification for outline root being special (fixed location, expectations, ordering, simple select of everything in a course). Breaks assumption that parent entity is n+1.

    • Special, nullable root field.

    • Why component vs. entity?

      • tighter focus

      • ability to

    • But higher levels, could be containers/container versions instead. more correct, db can verify

      • Schema: special fields for the leaf node and the root node, container references for intermediate nodes

      • (will need to join against xblock field data for inheritance calculation)

    • Dynamic children

      • Braden: If we have randomized content, where does it get represented?

        • Kyle: Where I’ve been leaning lately is that these Selectors exist outside the hierarchy and have metadata that affects how you’d turn an authored hierarchy into a user’s hierarchy.

      • Braden’s concerns: Are we going to have randomized content blocks with 10K+ children? Will it break this? If people naively look at this API, they might see these children and mistake them. How do we keep API users from misinterpreting this.

    • Can we make pointers to DAG’d problems, something lightweight?

    • Followup: talk to product/UX about DAGs in courses.

Jul 29, 2025

  • Dynamic Children

    • Do we try to unify the user partition functionality with dynamic child selection, or keep them separate?

    • Do we build a separate model for more efficiently storing hierarchy for rapid traversal?

    • (We really didn’t take notes well this session.)

Jul 22, 2025

CCX in learning core

  • Related tofix: course outline not found issue for ccx courses by Anas12091101 · Pull Request #36128 · openedx/edx-platform ?

    • Decision: Move CCX to openedx/ so it can be shared by LMS and CMS. But make sure that LMS can’t write to modulestore via the CCX wrapper.

  • Implications for permissions requirements?

  • Future

    • CCXs are PEs within a LearningPackage, the same LearningPackage of the CourseRun that they are based on

    • Pearson (along with others) want to have a more flexible level of customization.

    • MIT’s CCX use case is more restrictive, intentional limitations of customization to scheduling and hiding particular content.

    • It’s possible that we can implement the more flexible use case while having a list of customizable things, and then the MIT use case can be covered by disabling certain allowed customizations..

Retro: Lessons Learned from the Prototype

  • Learning Core → ModuleStore Shim

    • It looks workable.

    • We should store Course Usage Keys separately from the PublishableEntity. I first thought this was a compromise, but I’ve come around to the idea that it makes perfect sense, since those usage keys are very much an XBlock runtime concern, and the XBlock runtime app can control that mapping and the constraints on it.

    • The definition doc envelopes are easy enough to generate on the fly.

    • We do need the PublishLog and proper side-effect tracking in order to provide a real value forsubtree_edited_on , because that’s used for caching and other comparisons.

    • There are structure doc fields that aren’t necessary to fully preserve (e.g. what version initially created this thing), and would only be used for historical comparisons to other structure docs that won’t exist.

    • We do need to create branch awareness for preview purposes (this is more a reminder to myself).

    • We can get into a weird state if we let Modulestore try to edit courses that are being shimmed because the structure writes are thrown away.

    • We should check our structure caching with CCX to make sure it’ll work correctly (it’s a bit broken in split today).

    • Whatever we do for dynamic children needs to be able to compile out into parent-child relationships for the purposes of the shim.

    • Maybe we hide the supporting/weird blocks? Some of these look like they’re just bugs, so it’s probably worth figuring out what’s going on and maybe remove their usage in the course editing code.

  • Import of course data

    • We should batch search indexing (on the DraftChangeSets?)

    • Search indexing in general needs a bunch of improvements right now

  • While using migrator, it’s possible to get into a corrupt data state that is irrecoverable. Example: having a section and adding its container versions without adding related section versions. May be other places where we have app level constraints that aren’t reflected by database constraints.

    • Many ways for data to become inconsistent

  • Want clean separation on platform vs. learning core concerns. Not always clear when to put stuff in one place or another.

  • Django Admin is super-helpful when you build it out.

  • Modulestore Migrator is a mix of the libraries API and the XBlock API and the Learning Core APIs, and it needs to currently use all the libraries API to make sure we don’t miss upstream library stuff (like indexing). But should make it safe to use the authoring API directly so it bubbles up.

    • Events need to get into Learning Core. Hard to make sure it’s consistent otherwise.

  • What is a plugin that you could test with just Learning Core?

    • Discussions, where you need references to content and configuration that’s in content, but it has its own data and views.

    • ProblemBlock if there were a minimal xblock runtime

    • ORA2, sophisticated models

    • VideoBlock, being able to add VAL-like data

    • Would be great to have a minimal XBlock runtime that is used by edx-platform as well as xblock-sdk envs.

  • We don’t have a good story for Asides support.

  • Formally Deprecate XBlock after we figure out a good plan for dynamic children.

  • Need to measure data accumulation / pruning needs.

Are there other big unknowns we need to figure out aside from dynamic children?

  • userpartitions (part of selectors effort?)

  • Catalog course? (Not in authoring, anyway)

  • Mostly have static assets figured out?

  • Other, not-really-XBlock things:

    • Grading Policy

    • Scheduling?

  • Re-runs. e.g., whether we redo the versioning data model to allow for re-runs to reuse more componentversions. Or whether we add something lighter-weight. “branches”?

  • Keys, uuids, pks

Is there a faster path to support courses in Learning Core keeping the existing Studio UI exactly as-is?

  • Would it be worth it?

Jun 17, 2025

  • Kyle:

    • Migration tool and Django admin for outline roots

  • Braden

    • Mostly been finishing up other projects behind schedule

    • Will work on slide

  • Dave

    • Pushing basic structures to LC->Split shim works, but defs still coming from MongoDB right now.

Jun 10, 2025

Original Text of submission

The new Libraries experience introduced in Sumac stores content using Learning Core–a new, more efficient, and more extensible successor to the MongoDB-backed ModuleStore backend currently used for courses and legacy libraries. Learning Core offers tremendous benefits to operators and developers alike, but we must migrate our course content in order to fully realize those benefits.

We will explain these benefits in detail, propose a migration process, and explore the longer term implications.

Primarily, we want to communicate the benefits for site operators who undertake this migration. Secondarily, we want to touch on some nuances of the migration that developers may be interested in, providing them the knowledge and resources to learn more outside of the talk.

Short-term benefits (pre migration of courses) include a stable plugin API for library content authoring. We hope to include a reference plugin that enhances the library authoring experience in some way– for example, a plugin that displays version history for all library components. We would like to explain how this new plugin API dovetails with other existing Open edX extension points like Events, Filters, Slots, and XBlock.

Medium-term benefits (immediately post migration of courses) include: removal of MongoDB as platform dependency, a stable plugin API for learning components and assets, better Files & Uploads experience for course authors including versioning and searching, better content inspection and querying for administrators, more efficient serving of assets, reduced storage needs, and reduced memory overhead.

Long-term benefits include: stable plugin APIs for authoring units and sequences, user partitioning, and other learner-content interactions; ability to offer enrollable content outside of the traditional 3-level Open edX course hierarchy; massively simplified edx-platform maintenance; and better unit test data for edx-platform developers and plugin developers.

We also want to discuss backwards compatibility. We expect to be fully backwards-compatible with all Studio content and most-if-not-all OLX-authored content, with some caveats where compatibility is at odds with content security. For Open edX plugins which access ModuleStore today, we expect some of them to continue working, and others to break; we will go into more detail on the distinction between those two categories.

Outline (45 min talk)

Outline Brainstorming

  • Kyle: What would it be like to run LMS without MongoDB?

    • (Demo)

    • Here’s why you can’t do that in Teak

    • Talk about migration.

  • Braden: A lot of folks have ModuleStore understanding, vaguely know LC. Main point would be migration process, timeline

  • Kyle: Why is this additive, not just subtractive. Powerful plugin API.

  • Braden: Example of properly integrating video information into libraries and not the mess we have with VAL today.

  • Kyle: Would be nice to add some data to content just to show we can do it.

  • Call to action?

    • Kyle: Upgrade

    • Braden: Once it’s stable enough, would love to have a Learning Core course that’s editable from Libraries and a part of the dev experience.

    • Dave: anything they can do to prepare their content for this?

  • Selling it

    • removing mongodb obvi

      • cost savings?

        • ztraboo’s post on gridfs - good case study on how much s3 would save operators

      • one less piece of infra

    • show off the improved data model?

      • can we spruce the admin interface up more? → @Kyle McCormick

      • libraries UI lets you see raw olx

    • extensibility in the future

      • Much easier to show/access history

      • Have one xblock that extends the model

    • talk about shimming and how that’ll smooth transition

Outline Strawman:

(Dave: A starting point for conversations about our talk outline. I don’t feel like it flows together very well at the moment, but let’s talk about it in the next session.)

  1. What would mean to run LMS without MongoDB?

    1. What’s in our way?

  2. Motivation

    1. Cost reduction.

    2. Platform simplification.

    3. Transactions (good and bad (celery)).

    4. Granular, extensible data model.

      1. Concrete example here would be great, e.g. video or problemblock information, contrast with VAL.

      2. We can show screenshots of existing Django admin functionality

  3. How will this work?

    1. Goals: Transition quickly while preserving backwards compatibility.

    2. The LMS ModuleStore read-shim/compilation step.

    3. Porting Studio to be able to write Courses in Learning Core.

    4. Files and Uploads and where they’re stored.

    5. Gradual porting of other systems to bypass ModuleStore, e.g. grades, course blocks API, outlines, CSM.

  4. What changes will there be to the authoring experience?

    1. Course-centric editing is not going away, though it may not be exactly the same as the current course experience.

    2. We want to make it easier to bridge different levels/types of content, e.g. small courses,

  5. Timeline? Call to action?

Other ideas:

  • Break this up by target persona? Students aren’t intended to notice any difference during the transition, but we can separately map out Course Authors, Developers, and Ops folks?

  •  

 

Jun 3, 2025

 

May 15, 2025

  • Talking git-ification:

    • We could do add a version_num in a join table with the PE if necessary. Also depends on whether we want to restart history on re-run or not.

    • Cleanup gets harder because no direct fkey to PublishableEntity

      • Won’t happen automatically, but as long as there’s a join table for entity version ↔︎ entities. So look for versions that aren’t referenced (have cascade deletion for the entity)

  • Where should the CatalogCourse and Course live? Separate provisioning repo? Separate package? Want to be able to provision it before content exists potentially.

    • Kyle: openedx-learning should stop short of any catalog understanding, separation content from how people find and get access to content.

  • Dave went over Explicitly modeling publishing dependencies · Issue #317 · openedx/openedx-learning

2025-05-15

  • @Braden MacDonald opened a PR for OutlineRoot

    • Prototype models for Courses and Outline Roots in Learning Core by bradenmacdonald · Pull Request #316 · openedx/openedx-learning

    • Why did we want to put CourseRun in edx-platform?

      • Dave: dependencies we’d need to pull into learning core to have CourseRun there…

        • cohorts

        • grading policy

        • etc

      • Dave: could see more things moving into learning core at a time

      • Braden: What about a core CourseRun model in learning core, which edx-platform and other plugins could hang things off of (e.g. days_early_for_beta)

    • Braden: Learning Packages w.r.t. re-runs

      • In theory, a LP could have several re-runs

        • Hypothetically… we have a LP with a course run

        • And we copy the outline root

        • But we probably want to share the sections, subsections, etc

        • But if I want to edit some content in the re-run, then how does that not modify the original course?

        • Do we need to deepcopy the entire outline immediately upon rerun?

      • Dave: An LP makes sense when it’s the same set of authors

    • Kyle: thoughts about the outline

      • Option 1: Do a re-run, copy the whole outline immediately.

        • Braden: Right now each component has one pointer to a current Draft and a current Published. Could store for each branch, each representing a different run (draft_run1 pointer, publish_run1 pointer; draft_run2 pointer, published_run2 pointer).

        • Kyle: Should there be a fkey to PE that is branch.

          • instead of version_num being unique, (version_num, branch) is unique

            • this makes cloning/reruns very cheap

          • or could hopskotch and mix the history, requires more changes

    • Question: Do we want to have a separate namespace per course-run? This is not the case if we do branch encoding in the version.

    • Braden & Dave: Would be nice to have a working prototype where we can cobble this stuff together:

      • Kyle: should have a sandbox

 

Later:

  • Alternate proposal from @Braden MacDonald :

    • What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.

      • Example:

        • Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.

        • Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.

    • When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.

    • As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”

    • Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.

    • Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.

2025-05-06

  • Dave: update

    • implicit compile step for representing components and saving xblock fields

    • LC runtime doesn’t support parents and children

    • tie these realized fields to PEVersion rather than ComponentVersion. That way, anything with field data can use it

    • split modulestore does allow another “backend” than mongo, dave is trying that

    • one row per course per published verison of structure document

      • when we publish again, row gets replaced

      • (rather than pruning old structure docs)

      • do we need a draft version of the structure doc?

        • yes, for preview, but not necessarily for POC

    •  

  • Kyle: Data modelling

    • class Selector(PubEnt)

    • class SelectorVersion(PubEntVer)

      • partition: Partition

      • variants (reverse relation from Variant.selector_version)

    • class SplitTestVersion(SelectorVersion)

    • class ItemBankVersion(SelectorVersion)

      • count: int

    • class GradeGateVersion(SelectorVersion)

      • a hypothetical custom selector

      • selects a child based on the user’s current overall grade

    • class Variant(Model) [ alt name: class Selection(Model) ? ]

      • entity_list: EntityList

      • selector_version: SelectorVersion

      • group: Group|null

      • variant is valid iff all non-null of [selector_version, group] both match the query

        • otherwise, need to re-invoke selection process to determine Variant

        • old matching Variant factors in when determining a new Variant

        • new Variant may need to be generated if it does not exist

    • Partition

    • Group

2025-04-29

  • What’s the most minimal MVP we can do for getting course content into learning core

    • Learning core backend for SplitModuleStore

    • class ComponentVersionXBlockData(Model): cv = OneToOneField(ComponentVersion) content_fields = JSONField() settings_fields = JSONField()
    • Generating this would need us to instantiate the xblock runtime

      • Up front as a part of migration?

    • Braden: Would this be a temp thing?

      • Dave: Field data split is long term thing.

    • Braden: How do we handle containers?

      • Kyle: Simple/dumb data hanging off the containers

    • Braden: Switch on a per-course basis?

      • Yes, course waffle flags maybe?

        • But it would be bad to do this on a per-user basis lol

    • Braden: Is the split shim readonly or read/write?