Open edX Core - Arch Sync Notes

Open edX Core - Arch Sync Notes

Team: @Kyle McCormick @Dave Ormsbee (Axim) @Braden MacDonald

Epic: https://github.com/openedx/openedx-learning/issues/353

TOC:

May 5, 2026

  • Next steps between now and verawood.1

  • [Dave] Generally would should be taking a look at performance

  • What’s the release date?

    • Jun 23, 2026

    • Peformance and docs stuff will mostly land post-conference

  • [Dave] Do we want to cut a 1.x branch? I’m concerned about the unintended breakage that can ripple out to platform, particularly if we do anything that might require data migration. Even going from 0.47 to 0.48 to 1.0 caused unexpected problems.

    • [Kyle] I think this is a great idea

    • [Kyle] Created verawood-backports, to be versioned 1.0.2, 1.0.3, etc.

      • main will start with 1.1.0/ 2.0.0 and continue from there

  • [Dave] Can we use some of Braden’s remaining hours to explore Courses-on-Core work?

    • [Kyle] Yes 100%

    • Braden will use remaining time on this

  • Kyle/Dave will talk about resourcing for Courses-In-Core

  • [Dave] I’ve been doing some docs work with Claude in the background

    • So nice

    • rst formatting in docstrings

    • Currently in draft, big

    • Docstrings need updates - will do separate

    • Will do a similar one on the API side later

  • timeline

    • Kyle mostly focused on conference for next two weeks

    • Braden away during conference and until June 9th ish

    • Kyle away after conference until June 9th ish, then will be focused on docs and performance

Apr 28, 2026

Apr 21, 2026

Apr 14, 2026

  • ~1.5 weeks out from Verawood cutoff

  • @Dave Ormsbee (Axim) Proposed high level structure and abstractions for validation/restore flow:

    • (Zip Archive | Local Path | GitHub | etc.) → Storage Filesystem (fsspec)

    • Extract TOML from fs and compile into UnvalidatedCompletePackageInput

      • Rationale: Allow a way to have input formats that are better tailored to specific use cases, e.g. editing entire sections in the same file.

      • Contents

        • Unvalidated Learning Package dict (simple Python types)

          • Includes all TOML content (entities, versions, containers, collections), but does not include media files like block.xml or static assets.

          • This is meant to be a minimal transformation of the input TOML.

          • Structured to try to prevent certain types of logical errors, e.g. refs are used as keys in a dict, so there’s no way to represent duplicate definition of entities.

        • Errors

          • Mostly consistency errors, e.g. redefining the same entity multiple times.

          • Not the actual JSON Schema validation, but just errors when converting the source TOML into the compiled dict.

        • Resources (where to find media data for later)

    • UnvalidatedCompletePackageInputCompletePackageInput + errors

      • This is done using Pydantic models.

      • Input and output models will be kept separate (output models are much stricter about requiring certain fields).

      • JSON Schema is generated from the input model.

      • Two levels of errors:

        • Ones that JSON Schema can handle, e.g. missing fields, regex not matching, wrong types, etc.

        • Deeper ones that JSON Schema can’t deal with, like referential integrity (pointers to versions that don’t exist, containers referencing children that don’t exist, etc.)

        • Missing resources

      • Strict mode?

    • CompletePackageInputLearningPackage

    • Jesper: Having documentation of the end-to-end restore pipeline will be good

      • @Jesper Hodge Did I get that right ^ ?

    • Braden: We have many things: tar.gzs, xml, zips, json, we want to have a sqlite format…

      • Data researchers would probably want to have an ability to import openedx archives with a specific library - do we want this as a separate opendx_data library ? Probably not necessary…

    • Kyle: could this be modified to be update instead of create?

      • Dave: trickiest thing is version numbering

        • imagine that the archive specified v2 of a compnent, but you also have a v2

        • Jesper: when importing a version, it should always become the highest (newest) version

      • Kyle: concerned about the idea of having separate formats for full restore vs. partial import

    • Braden: stagedcontent

      • should this new format be used to represent stagedcontent

      • would be great if we could represent everything in a library as a file, just like we can with OLX today, enabling things like copy-paste and drag-and-drop UIs

    •  

Apr 7, 2026

Mar 31, 2026

  • ~3.5 weeks out from Verawood cutoff

  • @Dave Ormsbee (Axim) : Does enrollment get its own top level app in openedx-core? (In the context of LearningPathway enrollments.)

  • @Kyle McCormick Sample plugin ideas:

    • Feanil and I are giving a conference workshop on how to build multiple kinds of plugins into one unified omni-plugin GitHub - openedx/sample-plugin . Would like to get some openedx-core “plugin” representation in there. We’re thinking,

      • Course card archival

      • “Reviewed by ___”

        • model: ReviewedStatus(TimeStamped)

          • PE

          • DraftChangeLogEntry

          • User

        • rest api for marking as reviewed

        • (new?) Sidebar slot

        • new Filter: EntityPrePublish (or model pre-save signal)

          • be careful with PublishLog

          • should it remove things from the publish list, or cancel the whole publish?

            • just abort it - removing things would be full of footguns

            • removing things would have to happen at an earlier layer in order to be safer

        • ambitious: PublishReviewedItems

          • get_entities_with_unpublished_drafts

            • no dependencies - just actually reviewed things

Mar 24, 2026

  • ~4.5 weeks out from Verawood cutoff

Mar 17, 2026

  • @Kyle McCormickKey Coherency for openedx-core v1.0

  • @Kyle McCormick I’d like to revisit Braden’s Version branching proposal one more time, and decide if there’s anything we’d like to do before minting v1.0.

    • Alternate proposal from @Braden MacDonald :

      • What if we try for a more git-like model where our *Version models (e.g. ComponentVersion, PublishableEntityVersion) no longer point back to their unversioned model (Component, PublishableEntity), but instead just point to their previous *Version. Then, we can have multiple instances of a Component (with different keys) pointing at the same data (same *Version). They would have the same history, but if you modify one it would “fork” the versions, and diverge from the other.

        • Example:

          • Component Text1 exists, pointing to ComponentVersion “anonv1”. You edit it, creating ComponentVersion “anonv2”, previous version “anonv1”. Component Text1’s Draft pointer points to “anonv2” and published points to nothing.

          • Now you duplicate it (or re-run the course or make a CCX variant, or whatever). Component Text2 now exists (different key, same learning package) and its Draft pointer also points to “anonv2”. Now both Text1 and Text2 share the same data and same version history, but they will diverge if you edit them.

      • When we create a new course run from an existing course run, we would have to duplicate all the Components and Containers, but leave the *Version models alone. This makes it a much more efficient operation, because most of the data is in the *Version models, and it also means the full history is preserved.

      • As for keys, let’s say our CourseRun model has a “key_prefix” (or branch). Then if we have a library, course A run 2023, and course A run 2024 all in the same learning package, the library’s keys can be unprefixed, and the two course runs can have prefixes like “A2023:” and “A2024:” so that there are no collisions among the library and course runs - each has its own namespace without affecting the data model. (Of course you can just as easily make a namespace string field / db column on PublishableEntity if you want this to be more formal.) Example: opaque key “course-v2:org+A+2023:block:problem:p1” maps to PublishableEntity key “A2023:problemp1” and and the exact same component in the other run with opaque key “course-v2:org+A+2024:block:problem:p1” maps to “A2024:problemp1”

      • Upsides of this approach: very fast, has nice copy-on-write semantics, separate namespaces per run, much simpler/cleaner history.

      • Downsides: pretty significant change to the data model, ComponentVersions would not be deleted when the Component is deleted (occasional cleanup process required if you want to delete them), version numbers would be incrementing in jumps based on the highest version used in the learning package not the higher version used for that specific component.

Mar 11, 2026

  • @Braden MacDonald - containers

    • Most (all?) settings we have today for containers are really course policies.

    • Decision: keep container APIs generic (so you can call either create_container(type=Unit) or create_unit() and it works correctly), keep the empty Unit/UnitVersion models for foreign keys only, assume no content-related settings for now, and include nice “wrapper” APIs for dealing with Units etc. that just call the generic ones.

  • @Kyle McCormick - opaque keys

  • @Dave Ormsbee (Axim) - javier’s media assets proposal

    • wgu has a digital asset mgmt system

    • Assets as PEs that are dependencies of Components that they’re used in.

    • What do do with existing ComponentVersion media?

Mar 6, 2026

~5 weeks out from Verawood cutoff

Feb 10, 2026

  • Context table

Kyle McCormick
July 22, 2025

@Dave Ormsbee (Axim) maybe something to talk through today?

Dave Ormsbee (Axim)
July 22, 2025

Sure thing.

Dave Ormsbee (Axim)
July 22, 2025

I added a few other things as possible agenda items. I do want to do a brief retro to make sure we capture notes to ourselves before we forget any more of the prototyping, but I don’t think that’ll take up our full time.