Migrating Courses to Learning Core

Early notes and thoughts by @Kyle McCormick @Dave Ormsbee (Axim)

2024-11-20

  • Sumac

    • Learning Core is getting rolled out to prod for the first time

    • LC Components, Collections, Assets for Libraries v2

  • Teak

    • LC Units, Sections in Subsections for Libraries v2

    • Migration path for Libraries v1->v2

  • Ulmo

    • Remove Libraries v1

    • Standalone assets (files and uploads) for Libraries v2

    • Standalone assets (files and uploads) for Courses

      • LearningPackage for each course

      • LearingCoreContentstore

      • Slow, major migration (~1TB for 2U?)

      • Rollout questions… all-at-once, or mixed mode?

      • Benefits of asset conversion would be:

        • Cheaper storage

        • Faster/easier querying (particularly if we hook it up to a search backend).

        • May cause higher latency if we maintain the same URLs

  • Verawood

    • LC Components for Libraries

      • Components with children?

        • Some in edx-platform

          • a/b test, randomize, etc

          • Each of these is a Selector

            • Selected content is a Variant (some list of PubEnts)

            • selector.get_or_create_variant_for_user(args) -> Variant

            • Is this a generalization of learning_sequences?

            • Kinda but not really. LSeqs is built as a pipeline of processors which can hide or remove content. Intersection of the resulting sets is the users' outline

        • Some outside? Do we support? Or deprecate customization of get_children ?

          • Do not break callers of get_children

          • Do deprecate the ability to customize it, though

          • End result is that get_children is the responsibility of the runtime / edx-platform:

          • class XBlock: def get_children(...): yield from self.runtime.get_children(self.usage_key)
      • Ideas

        • Just migrate leaf components

          • get_item(leaf_block) → LC
            get_item(parent_block) → splitmongo

        • Implement Unit and below at LC level

          •  

        • Thing to watch out for: how to juggle the two different runtimes used for field data persistence.

  • Misc

    • Eventually

      • Leaf blocks can define views with arbitrary python

      • Parent blocks (containers?) are declarative, an external system looks at the rules they declare to determine the course tree

2024-11-13

Background assumption:

  •  

  • Remaining shims?

    • Some layer of basic shimming

      • LearningCoreModuleStore - thinnish shim layer

        • 80/20

  • At least one overlap release

    • Progressive (course-at-a-time) cutover

  • Long term: No Mongo

    • Remove Mongo first? (via SplitDjangoModuleStore)

      • Issues: Latency of S3 is much less predictable than Mongo

      • Issues: Length of data migration

        • Because V1 content libraries queries the structure document at different versions, we’d need to move a ton of structure docs over

          • But, v1 content libraries will be gone by Ulmo

      • Issue: We are still reading CourseBlocks (the root ones) from Old Mongo

    • Current state

      • Active Versions is read from and wrote to both Mongo and MySQL

        • MySQL is backfilled

        • Pruning would need to be ported over

          • Latency is a worry here. We could do more caching, but it would increase the memory footprint.

      • Structure docs are in Mongo

      • Definition docs are in Mongo

    • Or remove Mongo along with ModuleStore removal?

  • Standalone items:

    • Files and uploads

      • We can emulate these API promises fairly easily.

      •  

    • Vertical or horizontal migration?

      • Vertical: Top-down, one entire course at a time

      • Horizontal: Components all at a time, units all at a time, etc. up the tree

        • Example: Components become backed by Learning Core. modulestore still exists, but get_item delegates to LC when it’s a component.

  • What can be broken? Talk to Jenna. For example:

    • inheriting defaults

    • FBE

      •  

    • No breakage > Intentional breakage with DEPR > unintentional breakage

  • Multiple course runs in the same learning package?

    • This would be ideal.

  • Prototype components in Learning Core

  • Transactions.

    • Mongo commits everything immediately

    • MySQL commits it at the end of the request

  • CourseOverviews, Block Transformers, Learning Sequences

  •