Libraries/Learning Core Arch Sync Notes

Libraries/Learning Core Arch Sync Notes

@Kyle McCormick @Dave Ormsbee (Axim)

 

Jan 13, 2026

  • “ContainerType” (https://github.com/openedx/openedx-learning/issues/412)

    • edx-platform has developed the idea of each container having a single “ContainerType” – currently Unit, Subsection, or Section. https://github.com/openedx/edx-platform/blob/e6deac0cf12226c0b8d744ad17395373cfe0de03/openedx/core/djangoapps/content_libraries/api/container_metadata.py#L42

    • Do we want to actually support a Container having multiple “types”? E.g. can a Unit also be a ____ ?

      • If so, how should edx-platform change?

      • If not, can we codify the single-type restriction somehow?

    • Can we pull the idea of ContainerTypes into learning core?

    • Assumptions we could make:

      • Weaker: Every container has one type

        • In favor

      • Stronger: Every container is: Unit, Subsection, Section, (OutlineRoot)

        • Not in favor

    • Data model options

      • Put an actual field on the model for container_type?

        • we have this for components already. we also have the ComponentType model.

      • Somewhere we need to store the mapping of classes to OLX tags

      • How to register?

        • for components, it’s done thru xblock, and there’s a deterministic mapping between OLX tags and component_type names (blah ↔︎ xblock.v1:blah)

    • Case study: “Assessment”

      • What data does the Assessment have?

        • Assessment

          • (student scores)

        • AssessmentVersion

          • proctoring / timing info

          • a PubEnt that is the assessment’s content

            • Or a Container?

      • 3 options:

        • Assessment → Container

        • Container → Assessment

        • ?

      • this got interesting – see recording

    • Implementation

Jan 6, 2026

  • Proposal: https://github.com/openedx/openedx-learning/pull/454

    • We already require release-to-release migration

    • Investigate squashed migrations for going across app and bootstrapping authoring.

    • subapps instead of applets?

  • Taxonomies as PublishableEntities

    • Braden: Could we feasibly store it as a blob and not do the full versioning, or keep just the current draft/published versions to allow foreign keys to content?

  • Sam wants to implement a feature for bulk publishing--does our data model support this?

    • Dave thinks this is doable. Will look into making that query.

Dec 16, 2025

  • https://openedx.atlassian.net/wiki/spaces/OEPM/pages/5381390339

  • two layers of pluggability / two axes

    • granularity of content (course run as a node, unit as a node)

    • flexibility of completion criteria (all the things, some subset of things)

      • mary from unicon was talking about different kinds of criteria

      • v1: a fairly powerful datamodel with ands and ors on ndes

      • unicon proposal focuses on tags

        • tags map to competencies

        • can aggregate tags together

        • “you completed 3/5 things that represent some concept, which then rolls up into some greater concept”

          • we could bake everything on that notion

          • have competencies as the lowest common denominator type thing

    • it would be good to keep these axes independent of one another

    •  

  • we have to support CBEs, but it should also be agnostic

    • you should be able to say “complete these 6 courses” without involving competencies

    • “X is in the pathway”, without encoding how it X is part of the pathway’s completion

  • braden: what is CBE in this context?

    • dave: evaluate what you know rather than what you do

    • being able to assess what you understand

 

 

Nov 25, 2025

  • Kyle: Probably able to get FC money to push this along, but haven’t gotten that yet.

  • Dave: Is it worth considering alternatives to Pydantic for schema validation?

    • No.

  • Dave: Thought was that validation mechanism would be converting to JSON.

  • Multiple layers of validation--e.g. containers and components vs. more specific things that need plugins.

  • Plugin Examples:

    • Annotating items with notes/discussion. Could be limited to Studio (author notes).

      • PublishableEntity → messages

      • Frontend: Tab entry

    • Workflow

      • (see how far we can go with just tags)

    • Does this replace Asides?

    • AI Content Generation

    • Individual XBlocks

      • VideoBlock

      • ProblemBlock

  • Is it okay if we have field data that is redundant with OLX?

    • Braden: Okay, as long as source of truth is clear.

  • @Dave Ormsbee (Axim) to look at how to discourage direct model writes.

  • Braden: Let’s make sure there’s a solid justification for making this, i.e. we could just make a JSON blob of metadata for plugins to play with.

  • High risk areas / things that need a lot of deliberation:

  • Should we have OLX parsing for containers in LC?

  • Braden: Are we going to keep components separate from XBlock?

    • Kyle: XBlock as a layer over Component, but move as much of the common stuff into Component.

      • Leave XBlock as an escape hatch

    • Would be nice if the core “Component” API can handle basic things like display_name, scores (max_score, is_scorable), (tags?), even parsing them from the OLX if needed but never invoking/requiring xblock plugins (for specific xblock types). This should be doable since the outer OLX tag is very consistent with how it handles these mixin attributes, even though the inner content can vary wildly among different xblock types.

Oct 21, 2025

  • python_lib.zip in v2 libs

    • ideas

      • ability to load a python_lib.zip in a component in the Advanced section

      • conclusion: this would cause pain for import/export/backup/restore. Too hacky to support long term, don’t do this.

    • what is a code library?

      • a component? no, we want it to belong to other components (problems)

      • an asset? no, we’d like versioning

      • maybe something that works with zip archives, which we can store cheaply. (want to avoid the high overhead of having many versioned files given how we store version-name mappings)

    • backport?

      • yes

    • Important note: v1 libraries don’t copy python_lib.zip either. We might be able to get away with creating a new Resource type that works how we want within libraries, but relies on authors to upload the python_lib.zip file to their course. (Or maybe we let them select one python_lib.zip from their course to sync over?) It would let us side-step the “every library problem could have a different version of a python_lib.zip file to store” issue.

Oct 7, 2025

Aug 19, 2025

  • Efficient Container Hierarchy Traversal · Issue #360 · openedx/openedx-learning

    • Braden: general agreement on table structure and

    • Discussion of the justification for outline root being special (fixed location, expectations, ordering, simple select of everything in a course). Breaks assumption that parent entity is n+1.

    • Special, nullable root field.

    • Why component vs. entity?

      • tighter focus

      • ability to

    • But higher levels, could be containers/container versions instead. more correct, db can verify

      • Schema: special fields for the leaf node and the root node, container references for intermediate nodes

      • (will need to join against xblock field data for inheritance calculation)

    • Dynamic children

      • Braden: If we have randomized content, where does it get represented?

        • Kyle: Where I’ve been leaning lately is that these Selectors exist outside the hierarchy and have metadata that affects how you’d turn an authored hierarchy into a user’s hierarchy.

      • Braden’s concerns: Are we going to have randomized content blocks with 10K+ children? Will it break this? If people naively look at this API, they might see these children and mistake them. How do we keep API users from misinterpreting this.

    • Can we make pointers to DAG’d problems, something lightweight?

    • Followup: talk to product/UX about DAGs in courses.

Jul 29, 2025

  • Dynamic Children

    • Do we try to unify the user partition functionality with dynamic child selection, or keep them separate?

    • Do we build a separate model for more efficiently storing hierarchy for rapid traversal?

    • (We really didn’t take notes well this session.)

Jul 22, 2025

CCX in learning core

  • Related tofix: course outline not found issue for ccx courses by Anas12091101 · Pull Request #36128 · openedx/edx-platform ?

    • Decision: Move CCX to openedx/ so it can be shared by LMS and CMS. But make sure that LMS can’t write to modulestore via the CCX wrapper.

  • Implications for permissions requirements?

  • Future

    • CCXs are PEs within a LearningPackage, the same LearningPackage of the CourseRun that they are based on

    • Pearson (along with others) want to have a more flexible level of customization.

    • MIT’s CCX use case is more restrictive, intentional limitations of customization to scheduling and hiding particular content.

    • It’s possible that we can implement the more flexible use case while having a list of customizable things, and then the MIT use case can be covered by disabling certain allowed customizations..

Retro: Lessons Learned from the Prototype

  • Learning Core → ModuleStore Shim

    • It looks workable.

    • We should store Course Usage Keys separately from the PublishableEntity. I first thought this was a compromise, but I’ve come around to the idea that it makes perfect sense, since those usage keys are very much an XBlock runtime concern, and the XBlock runtime app can control that mapping and the constraints on it.

    • The definition doc envelopes are easy enough to generate on the fly.

    • We do need the PublishLog and proper side-effect tracking in order to provide a real value forsubtree_edited_on , because that’s used for caching and other comparisons.

    • There are structure doc fields that aren’t necessary to fully preserve (e.g. what version initially created this thing), and would only be used for historical comparisons to other structure docs that won’t exist.

    • We do need to create branch awareness for preview purposes (this is more a reminder to myself).

    • We can get into a weird state if we let Modulestore try to edit courses that are being shimmed because the structure writes are thrown away.

    • We should check our structure caching with CCX to make sure it’ll work correctly (it’s a bit broken in split today).

    • Whatever we do for dynamic children needs to be able to compile out into parent-child relationships for the purposes of the shim.

    • Maybe we hide the supporting/weird blocks? Some of these look like they’re just bugs, so it’s probably worth figuring out what’s going on and maybe remove their usage in the course editing code.

  • Import of course data

    • We should batch search indexing (on the DraftChangeSets?)

    • Search indexing in general needs a bunch of improvements right now

  • While using migrator, it’s possible to get into a corrupt data state that is irrecoverable. Example: having a section and adding its container versions without adding related section versions. May be other places where we have app level constraints that aren’t reflected by database constraints.

    • Many ways for data to become inconsistent

  • Want clean separation on platform vs. learning core concerns. Not always clear when to put stuff in one place or another.

  • Django Admin is super-helpful when you build it out.

  • Modulestore Migrator is a mix of the libraries API and the XBlock API and the Learning Core APIs, and it needs to currently use all the libraries API to make sure we don’t miss upstream library stuff (like indexing). But should make it safe to use the authoring API directly so it bubbles up.

    • Events need to get into Learning Core. Hard to make sure it’s consistent otherwise.

  • What is a plugin that you could test with just Learning Core?

    • Discussions, where you need references to content and configuration that’s in content, but it has its own data and views.

    • ProblemBlock if there were a minimal xblock runtime

    • ORA2, sophisticated models

    • VideoBlock, being able to add VAL-like data

    • Would be great to have a minimal XBlock runtime that is used by edx-platform as well as xblock-sdk envs.

  • We don’t have a good story for Asides support.

  • Formally Deprecate XBlock after we figure out a good plan for dynamic children.

  • Need to measure data accumulation / pruning needs.

Are there other big unknowns we need to figure out aside from dynamic children?

  • userpartitions (part of selectors effort?)

  • Catalog course? (Not in authoring, anyway)

  • Mostly have static assets figured out?

  • Other, not-really-XBlock things:

    • Grading Policy

    • Scheduling?

  • Re-runs. e.g., whether we redo the versioning data model to allow for re-runs to reuse more componentversions. Or whether we add something lighter-weight. “branches”?

  • Keys, uuids, pks

Is there a faster path to support courses in Learning Core keeping the existing Studio UI exactly as-is?

  • Would it be worth it?

Jun 17, 2025

  • Kyle:

    • Migration tool and Django admin for outline roots

  • Braden

    • Mostly been finishing up other projects behind schedule

    • Will work on slide

  • Dave

    • Pushing basic structures to LC->Split shim works, but defs still coming from MongoDB right now.

Jun 10, 2025

Original Text of submission

The new Libraries experience introduced in Sumac stores content using Learning Core–a new, more efficient, and more extensible successor to the MongoDB-backed ModuleStore backend currently used for courses and legacy libraries. Learning Core offers tremendous benefits to operators and developers alike, but we must migrate our course content in order to fully realize those benefits.

We will explain these benefits in detail, propose a migration process, and explore the longer term implications.

Primarily, we want to communicate the benefits for site operators who undertake this migration. Secondarily, we want to touch on some nuances of the migration that developers may be interested in, providing them the knowledge and resources to learn more outside of the talk.

Short-term benefits (pre migration of courses) include a stable plugin API for library content authoring. We hope to include a reference plugin that enhances the library authoring experience in some way– for example, a plugin that displays version history for all library components. We would like to explain how this new plugin API dovetails with other existing Open edX extension points like Events, Filters, Slots, and XBlock.

Medium-term benefits (immediately post migration of courses) include: removal of MongoDB as platform dependency, a stable plugin API for learning components and assets, better Files & Uploads experience for course authors including versioning and searching, better content inspection and querying for administrators, more efficient serving of assets, reduced storage needs, and reduced memory overhead.

Long-term benefits include: stable plugin APIs for authoring units and sequences, user partitioning, and other learner-content interactions; ability to offer enrollable content outside of the traditional 3-level Open edX course hierarchy; massively simplified edx-platform maintenance; and better unit test data for edx-platform developers and plugin developers.

We also want to discuss backwards compatibility. We expect to be fully backwards-compatible with all Studio content and most-if-not-all OLX-authored content, with some caveats where compatibility is at odds with content security. For Open edX plugins which access ModuleStore today, we expect some of them to continue working, and others to break; we will go into more detail on the distinction between those two categories.

Outline (45 min talk)

Outline Brainstorming

  • Kyle: What would it be like to run LMS without MongoDB?

    • (Demo)

    • Here’s why you can’t do that in Teak

    • Talk about migration.

  • Braden: A lot of folks have ModuleStore understanding, vaguely know LC. Main point would be migration process, timeline

  • Kyle: Why is this additive, not just subtractive. Powerful plugin API.

  • Braden: Example of properly integrating video information into libraries and not the mess we have with VAL today.

  • Kyle: Would be nice to add some data to content just to show we can do it.

  • Call to action?

    • Kyle: Upgrade

    • Braden: Once it’s stable enough, would love to have a Learning Core course that’s editable from Libraries and a part of the dev experience.

    • Dave: anything they can do to prepare their content for this?

  • Selling it

    • removing mongodb obvi

      • cost savings?

        • ztraboo’s post on gridfs - good case study on how much s3 would save operators

      • one less piece of infra

    • show off the improved data model?

      • can we spruce the admin interface up more? → @Kyle McCormick

      • libraries UI lets you see raw olx

    • extensibility in the future

      • Much easier to show/access history

      • Have one xblock that extends the model

    • talk about shimming and how that’ll smooth transition

Outline Strawman:

(Dave: A starting point for conversations about our talk outline. I don’t feel like it flows together very well at the moment, but let’s talk about it in the next session.)

  1. What would mean to run LMS without MongoDB?

    1. What’s in our way?

  2. Motivation

    1. Cost reduction.

    2. Platform simplification.

    3. Transactions (good and bad (celery)).

    4. Granular, extensible data model.

      1. Concrete example here would be great, e.g. video or problemblock information, contrast with VAL.

      2. We can show screenshots of existing Django admin functionality

  3. How will this work?

    1. Goals: Transition quickly while preserving backwards compatibility.

    2. The LMS ModuleStore read-shim/compilation step.

    3. Porting Studio to be able to write Courses in Learning Core.

    4. Files and Uploads and where they’re stored.

    5. Gradual porting of other systems to bypass ModuleStore, e.g. grades, course blocks API, outlines, CSM.

  4. What changes will there be to the authoring experience?

    1. Course-centric editing is not going away, though it may not be exactly the same as the current course experience.

    2. We want to make it easier to bridge different levels/types of content, e.g. small courses,

  5. Timeline? Call to action?

Other ideas:

  • Break this up by target persona? Students aren’t intended to notice any difference during the transition, but we can separately map out Course Authors, Developers, and Ops folks?

  •  

 

Jun 3, 2025

 

May 15, 2025

  • Talking git-ification:

    • We could do add a version_num in a join table with the PE if necessary. Also depends on whether we want to restart history on re-run or not.

    • Cleanup gets harder because no direct fkey to PublishableEntity

      • Won’t happen automatically, but as long as there’s a join table for entity version ↔︎ entities. So look for versions that aren’t referenced (have cascade deletion for the entity)

  • Where should the CatalogCourse and Course live? Separate provisioning repo? Separate package? Want to be able to provision it before content exists potentially.

    • Kyle: openedx-learning should stop short of any catalog understanding, separation content from how people find and get access to content.

  • Dave went over Explicitly modeling publishing dependencies · Issue #317 · openedx/openedx-learning

2025-05-15