Open edX Core - Arch Sync Notes

Open edX Core - Arch Sync Notes

Team: @Kyle McCormick @Dave Ormsbee (Axim) @Braden MacDonald

Epic: https://github.com/openedx/openedx-learning/issues/353

TOC:

Apr 14, 2026

  • ~1.5 weeks out from Verawood cutoff

  • @Dave Ormsbee (Axim) Proposed high level structure and abstractions for validation/restore flow:

    • (Zip Archive | Local Path | GitHub | etc.) → Storage Filesystem (fsspec)

    • Extract TOML from fs and compile into UnvalidatedCompletePackageInput

      • Rationale: Allow a way to have input formats that are better tailored to specific use cases, e.g. editing entire sections in the same file.

      • Contents

        • Unvalidated Learning Package dict (simple Python types)

          • Includes all TOML content (entities, versions, containers, collections), but does not include media files like block.xml or static assets.

          • This is meant to be a minimal transformation of the input TOML.

          • Structured to try to prevent certain types of logical errors, e.g. refs are used as keys in a dict, so there’s no way to represent duplicate definition of entities.

        • Errors

          • Mostly consistency errors, e.g. redefining the same entity multiple times.

          • Not the actual JSON Schema validation, but just errors when converting the source TOML into the compiled dict.

        • Resources (where to find media data for later)

    • UnvalidatedCompletePackageInputCompletePackageInput + errors

      • This is done using Pydantic models.

      • Input and output models will be kept separate (output models are much stricter about requiring certain fields).

      • JSON Schema is generated from the input model.

      • Two levels of errors:

        • Ones that JSON Schema can handle, e.g. missing fields, regex not matching, wrong types, etc.

        • Deeper ones that JSON Schema can’t deal with, like referential integrity (pointers to versions that don’t exist, containers referencing children that don’t exist, etc.)

        • Missing resources

      • Strict mode?

    • CompletePackageInputLearningPackage

Apr 7, 2026

  • ~2.5 weeks out from Verawood cutoff

  • @Kyle McCormick Whiteboarding - “north star” architecture https://excalidraw.com/#room=9e33ec3e3ebf9175de2b,nAqtkyhx59SUYlEl9a-XiQ

    •  

  • @Kyle McCormick @Braden MacDonald - Confirm we want LEARNING_PACKAGES_* events (https://github.com/openedx/openedx-core/issues/462#issuecomment-4193595258 )

  • @Kyle McCormick @Dave Ormsbee (Axim) Met with MITx physics course author teams, whose use OLX heavily.

    • They have several repos, each holds 1 or more related courses

      • possible argument for learning packages holding multiple courses / course runs

    • Automated workflow: merge to master triggers the XML to import into their staging env

      • Last-minute tweaks may be made in studio, but are wiped out upon next course update from git. Ad hoc process for remembering to make fixes back in XML.

    • Each section is an XML file, holds structure down to unit or component level

      • toml does not support multiple levels in one file. seems like it’d be easy to support that, though?

    • Many units are authored in .tex files and converted in unit XML via latex2edx

      • does this imply that units should be able to hold assets?

  • Decision: For Verawood, just use pydantic to validate and document the current format. Worry about pluggability later, as we’re considering not even sticking with TOML (sqllite? more OLX?) long term.

  • @Braden MacDonald opened a PR to make type annotations for primary keys.

Mar 31, 2026

  • ~3.5 weeks out from Verawood cutoff

  • @Dave Ormsbee (Axim) : Does enrollment get its own top level app in openedx-core? (In the context of LearningPathway enrollments.)

  • @Kyle McCormick Sample plugin ideas:

    • Feanil and I are giving a conference workshop on how to build multiple kinds of plugins into one unified omni-plugin https://github.com/openedx/sample-plugin . Would like to get some openedx-core “plugin” representation in there. We’re thinking,

      • Course card archival

      • “Reviewed by ___”

        • model: ReviewedStatus(TimeStamped)

          • PE

          • DraftChangeLogEntry

          • User

        • rest api for marking as reviewed

        • (new?) Sidebar slot

        • new Filter: EntityPrePublish (or model pre-save signal)

          • be careful with PublishLog

          • should it remove things from the publish list, or cancel the whole publish?

            • just abort it - removing things would be full of footguns

            • removing things would have to happen at an earlier layer in order to be safer

        • ambitious: PublishReviewedItems

          • get_entities_with_unpublished_drafts

            • no dependencies - just actually reviewed things

Mar 24, 2026

  • ~4.5 weeks out from Verawood cutoff

Mar 17, 2026

  • @Kyle McCormickKey Coherency for openedx-core v1.0

  • @Kyle McCormick I’d like to revisit Braden’s Version branching proposal one more time, and decide if there’s anything we’d like to do before minting v1.0.

    • Alternate proposal from @Braden MacDonald :

      • What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.

        • Example:

          • Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.

          • Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.

      • When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.

      • As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”

      • Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.

      • Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.

Mar 11, 2026

  • @Braden MacDonald - containers

    • Most (all?) settings we have today for containers are really course policies.

    • Decision: keep container APIs generic (so you can call either create_container(type=Unit) or create_unit() and it works correctly), keep the empty Unit/UnitVersion models for foreign keys only, assume no content-related settings for now, and include nice “wrapper” APIs for dealing with Units etc. that just call the generic ones.

  • @Kyle McCormick - opaque keys

  • @Dave Ormsbee (Axim) - javier’s media assets proposal

    • wgu has a digital asset mgmt system

    • Assets as PEs that are dependencies of Components that they’re used in.

    • What do do with existing ComponentVersion media?

Mar 6, 2026

~5 weeks out from Verawood cutoff

Feb 10, 2026

  • Context table

    • context_key: LearningContextKeyField

  • Usage table

    • usage_key: UsageKeyField

    • context: FK(Context)

    • block_type: str

    • block_slug: str

Jan 20, 2026

  • Adding CourseRun to LC, in support of Learning Pathways

    • Why?

      • If we want to build pathways in the LC repo, those pathways' steps will need to FK to CourseRuns as part of their completion criteria

      • Caveats!

        • do they really mean CourseRun and not Course?

          • for ASU at least, they’ll have 1-1 mappings between Run<->Course, so it doesn’t matter to them

        • it won’t always be tied to CourseRun.

        • e.g. if you’ve already taken course X on instance A then we’ll give you credit for that on thing on instance B

      • we shoudn’t be so prescriptive about what completion requirements are – still good to have FKs in some way though

    • wouldn’t put CourseRun in authoring - gets its own top-level thing?

    • Course vs CourseRun has been a half-baked feature

    • Simplest thing

      • New catalog app

      • One model: CourseRun

        • push to it from CourseOverview

          • as in, whenever there’s an update to CO, push the information into CourseRun

      • Course model? probably not necessary, because today, there’s no data in it other than FKs, which we don’t think we need. If we do nee that in the future, we could backfill one pretty trivially

      • proliferation of courserun tables: CourseOverview, SplitModulestoreIndex, now CourseRun

        • but CourseRun is the really real one now

      • Capitalization

        • case sensitive in LC!

        •  

      • FK to organizations_organization table?

  • openedx-learning and the catalog app

    • openedx-core?

    • WOuld the same people maintain the authoring_api and this new catalog_api?

    • braden: we should have some core catalog models for things like organizations, courseruns, maybe courses. but keep them simple ,draw a fence around them, anything more bespoke should be done outside of the core

    • if openedx-learning becomes openedx-core, then we would want

      • openedx_core.api.authoring -> openedx_core.api.catalog

    • openedx_core.api.modular_learning as a peer to authoring

      • pathways

      • administrative bits

    • we want to innovate on T&L, but not on the catalog.

    • history: openedx-learning was the name back when we thought that this woudl be pushed to from Studio, rather than the authoring source models

  • Thinking about dev future

    • importing from openedx_platform., openedx_events, openedx_filters, xblock

    • openedx_api.

  • openedx-core

    • openedx_tagging

    • openedx_catalog

    • openedx_authoring

      • openedx_content?

        • related questions: UserPartitions, do they go in here, or in a new thing?

    • openedx_keys ← aspirational goal!

  • (Any other Learning Pathway topics?)

  • Challenges with type-annotating XBlock

    • Seems like usage_ids are always UsageKeys except in on place: MemoryIdManager uses strings. But MemoryIdManager is only used in XBlock tests (already fixed on a branch) and in the LearningCore-based runtime with the comment # We don't really use id_generator until we need to support asides.

      • Can we delete MemoryIdManager, so that usage_ids are always instances of UsageKey?

    • The type of def_idis all over the place in edx-platform. Best I came up with is this: DefinitionId: t.TypeAlias = DefinitionKey | UsageKey | ObjectId | LocalId | str.

      • Thoughts?

      • Note: my understanding is that in a fully LC-world, def_ids are redundant with usage_ids.

Jan 13, 2026

  • “ContainerType” (https://github.com/openedx/openedx-learning/issues/412)

    • edx-platform has developed the idea of each container having a single “ContainerType” – currently Unit, Subsection, or Section. https://github.com/openedx/edx-platform/blob/e6deac0cf12226c0b8d744ad17395373cfe0de03/openedx/core/djangoapps/content_libraries/api/container_metadata.py#L42

    • Do we want to actually support a Container having multiple “types”? E.g. can a Unit also be a ____ ?

      • If so, how should edx-platform change?

      • If not, can we codify the single-type restriction somehow?

    • Can we pull the idea of ContainerTypes into learning core?

    • Assumptions we could make:

      • Weaker: Every container has one type

        • In favor

      • Stronger: Every container is: Unit, Subsection, Section, (OutlineRoot)

        • Not in favor

    • Data model options

      • Put an actual field on the model for container_type?

        • we have this for components already. we also have the ComponentType model.

      • Somewhere we need to store the mapping of classes to OLX tags

      • How to register?

        • for components, it’s done thru xblock, and there’s a deterministic mapping between OLX tags and component_type names (blah ↔︎ xblock.v1:blah)

    • Case study: “Assessment”

      • What data does the Assessment have?

        • Assessment

          • (student scores)

        • AssessmentVersion

          • proctoring / timing info

          • a PubEnt that is the assessment’s content

            • Or a Container?

      • 3 options:

        • Assessment → Container

        • Container → Assessment

        • ?

      • this got interesting – see recording

    • Implementation

Jan 6, 2026

  • Proposal: https://github.com/openedx/openedx-learning/pull/454

    • We already require release-to-release migration

    • Investigate squashed migrations for going across app and bootstrapping authoring.

    • subapps instead of applets?

  • Taxonomies as PublishableEntities

    • Braden: Could we feasibly store it as a blob and not do the full versioning, or keep just the current draft/published versions to allow foreign keys to content?

  • Sam wants to implement a feature for bulk publishing--does our data model support this?

    • Dave thinks this is doable. Will look into making that query.

Dec 16, 2025

  • A Shared Architecture for Modular Learning

  • two layers of pluggability / two axes

    • granularity of content (course run as a node, unit as a node)

    • flexibility of completion criteria (all the things, some subset of things)

      • mary from unicon was talking about different kinds of criteria

      • v1: a fairly powerful datamodel with ands and ors on ndes

      • unicon proposal focuses on tags

        • tags map to competencies

        • can aggregate tags together

        • “you completed 3/5 things that represent some concept, which then rolls up into some greater concept”

          • we could bake everything on that notion

          • have competencies as the lowest common denominator type thing

    • it would be good to keep these axes independent of one another

    •  

  • we have to support CBEs, but it should also be agnostic

    • you should be able to say “complete these 6 courses” without involving competencies

    • “X is in the pathway”, without encoding how it X is part of the pathway’s completion

  • braden: what is CBE in this context?

    • dave: evaluate what you know rather than what you do

    • being able to assess what you understand

 

 

Nov 25, 2025

  • Kyle: Probably able to get FC money to push this along, but haven’t gotten that yet.

  • Dave: Is it worth considering alternatives to Pydantic for schema validation?

    • No.

  • Dave: Thought was that validation mechanism would be converting to JSON.

  • Multiple layers of validation--e.g. containers and components vs. more specific things that need plugins.

  • Plugin Examples:

    • Annotating items with notes/discussion. Could be limited to Studio (author notes).

      • PublishableEntity → messages

      • Frontend: Tab entry

    • Workflow

      • (see how far we can go with just tags)

    • Does this replace Asides?

    • AI Content Generation

    • Individual XBlocks

      • VideoBlock

      • ProblemBlock

  • Is it okay if we have field data that is redundant with OLX?

    • Braden: Okay, as long as source of truth is clear.

  • @Dave Ormsbee (Axim) to look at how to discourage direct model writes.

  • Braden: Let’s make sure there’s a solid justification for making this, i.e. we could just make a JSON blob of metadata for plugins to play with.

  • High risk areas / things that need a lot of deliberation:

  • Should we have OLX parsing for containers in LC?