Split Modulestore (Draft Versioning Modulestore)

Split Mongo supports an arbitrary number of branches, though only "draft" and "published" are currently used. Course version history is preserved as edits are made, allowing rollback to a previous course version. Course locators for Split courses are the new style supported by opaque_keys. For example, here's a serialized Split course locator:

course-v1:SQU+SQU101+2014_T1+branch@published+version@8c056ceea2f35a1d705bd4c13d79c15b495a0f53

References - Old and New

Old

https://edx-wiki.atlassian.net/wiki/pages/viewpage.action?spaceKey=ENG&title=Mongostore+Data+Structure

https://github.com/edx/edx-platform/wiki/Split:-the-versioning,-structure-saving-DAO

https://github.com/edx/edx-platform/wiki/Split-mongo-architecture-and-rollout-options

New

http://edx.readthedocs.org/projects/edx-developer-guide/en/latest/modulestores/split-mongo.html

https://github.com/edx/edx-platform/blob/master/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py#L2-54

Course Format in MongoDB

There are three Split Mongo collections in MongoDB:

  • modulestore.active_versions
  • modulestore.structures
  • modulestore.definitions

active_versions

  • Each course has a single active_versions document.
    • That document points to both a draft and published structures _id.

structures

  • The entire course structure is stored in a single structures document.
    • Each time a structure changes, a new structure is created by cloning and changing the previous structure.
  • The version of a course branch is the _id of the course branch's most recent structures document.
  • Each course has a current draft structure and a current published structure.

definitions

  • A course structure does not contain the course content. The content is kept in the definitions collection.

The Code

Query Patterns

To retrieve the entire published course structure, the modulestore queries a single document from active_versions and another single document from structures (by _id).

Other Relevant Pages/Slides/Docs/Code

How to Recover from a Broken Split Course

A project was started to assist in determining information about the history of a Split course - and to rollback a course to an earlier version. It's here:

https://github.com/macdiesel/edx-split-utils

How To Find the Size of All Active Structures

Issue this command at the MongoDB command line of the replica DB:

db.modulestore.active_versions.find().forEach(
    function(obj) {
        var published_id = obj["versions"]["published-branch"];
        struct_obj = db.modulestore.structures.findOne(published_id);
        var curr = Object.bsonsize(struct_obj);
        print(curr + " :: " + obj["org"] + "/" + obj["course"] + "/" + obj["run"]);
    }
)

How To Walk Structure History Backwards From the Current Active Version 

rs.slaveOk();
var course_id = { "org": "HarvardX", "course": "SPU27x", "run": "2015_Q2"};
course_idx = db.modulestore.active_versions.findOne(course_id);
print("Course Index:")
printjson(course_idx);

var pub_struct_id = course_idx["versions"]["draft-branch"];

/* Walk the draft structure version tree backwards. */
var curr_struct = db.modulestore.structures.findOne( { "_id": pub_struct_id }, {_id: 1, edited_on: 1, previous_version: 1} );
print("Current draft structure: " + curr_struct._id);
print("Edited on: " + curr_struct.edited_on);

var generation = 1;
while ( curr_struct != null ) {
    curr_struct = db.modulestore.structures.findOne( { "_id": curr_struct.previous_version }, {_id: 1, edited_on: 1, previous_version: 1} );
    print("Previous: " + curr_struct._id + " - Edited on: " + curr_struct.edited_on + " - generation: " + generation);
    generation++;
}