Migrating Courses to Learning Core
Early notes and thoughts by @Kyle McCormick @Dave Ormsbee (Axim)
2024-11-20
Sumac
Learning Core is getting rolled out to prod for the first time
LC Components, Collections, Assets for Libraries v2
Teak
LC Units, Sections in Subsections for Libraries v2
Migration path for Libraries v1->v2
Ulmo
Remove Libraries v1
Standalone assets (files and uploads) for Libraries v2
Standalone assets (files and uploads) for Courses
LearningPackage for each course
LearingCoreContentstore
Slow, major migration (~1TB for 2U?)
Rollout questions… all-at-once, or mixed mode?
Benefits of asset conversion would be:
Cheaper storage
Faster/easier querying (particularly if we hook it up to a search backend).
May cause higher latency if we maintain the same URLs
Verawood
LC Components for Libraries
Components with children?
Some in edx-platform
a/b test, randomize, etc
Each of these is a Selector
Selected content is a Variant (some list of PubEnts)
selector.get_or_create_variant_for_user(args) -> Variant
Is this a generalization of learning_sequences?
Kinda but not really. LSeqs is built as a pipeline of processors which can hide or remove content. Intersection of the resulting sets is the users' outline
Some outside? Do we support? Or deprecate customization of
get_children
?Do not break callers of
get_children
Do deprecate the ability to customize it, though
End result is that get_children is the responsibility of the runtime / edx-platform:
class XBlock: def get_children(...): yield from self.runtime.get_children(self.usage_key)
Ideas
Just migrate leaf components
get_item(leaf_block) → LC
get_item(parent_block) → splitmongo
Implement Unit and below at LC level
Thing to watch out for: how to juggle the two different runtimes used for field data persistence.
Misc
Eventually
Leaf blocks can define views with arbitrary python
Parent blocks (containers?) are declarative, an external system looks at the rules they declare to determine the course tree
2024-11-13
Libraries prototype: https://github.com/openedx/edx-platform/pull/35758
.
Background assumption:
Remaining shims?
Some layer of basic shimming
LearningCoreModuleStore - thinnish shim layer
80/20
At least one overlap release
Progressive (course-at-a-time) cutover
Long term: No Mongo
Remove Mongo first? (via SplitDjangoModuleStore)
Issues: Latency of S3 is much less predictable than Mongo
Issues: Length of data migration
Because V1 content libraries queries the structure document at different versions, we’d need to move a ton of structure docs over
But, v1 content libraries will be gone by Ulmo
Issue: We are still reading CourseBlocks (the root ones) from Old Mongo
Current state
Active Versions is read from and wrote to both Mongo and MySQL
MySQL is backfilled
Pruning would need to be ported over
Latency is a worry here. We could do more caching, but it would increase the memory footprint.
Structure docs are in Mongo
Definition docs are in Mongo
Or remove Mongo along with ModuleStore removal?
Standalone items:
Files and uploads
We can emulate these API promises fairly easily.
Vertical or horizontal migration?
Vertical: Top-down, one entire course at a time
Horizontal: Components all at a time, units all at a time, etc. up the tree
Example: Components become backed by Learning Core. modulestore still exists, but get_item delegates to LC when it’s a component.
What can be broken? Talk to Jenna. For example:
inheriting defaults
FBE
No breakage > Intentional breakage with DEPR > unintentional breakage
Multiple course runs in the same learning package?
This would be ideal.
Prototype components in Learning Core
Transactions.
Mongo commits everything immediately
MySQL commits it at the end of the request
CourseOverviews, Block Transformers, Learning Sequences