Compositor Architecture Proposal

This is a rough proposal for the high-level functionality that will be built on Blockstore to implement things like courses, content libraries, and learnable units embedded outside of their course context.

"Compositor" is a placeholder name for the system(s) which take the raw content data in Blockstore and combine it with context-specific data (e.g. course run settings) and user-specific data/state (e.g. adaptive learning engine outputs) to create the final learnable object that the learner will see.

Goals for compositor

The goals of this compositor design are:

  • To provide the features required for the LMS deliver courses that are stored in Blockstore instead of modulestore
  • To enable adaptive learning integrations that can have much more control over each user's course experience
  • To combine CCX courses and regular courses into a consistent course authoring workflow, with the same or more features as CCX.

Course/Content Authoring→Learning Flow Diagram

This diagram shows the two transforms that are applied to content, to bring it from the write-optimized Blockstore to the dynamic learning experience:

Design Details

Here are some of the features of this system design in more detail:

Courses are authored in two phases

All courses are initially authored as a "course template"; this would be similar to how courses are authored in Studio today, except that a course template cannot be directly used by learners, and the course content would be saved into Blockstore instead of the modulestore. Before a course can be used by learners, at least one "course run" must be created from the template.

A course template includes:

  • Name of the course
  • The full course outline (list of chapters, subsections, and units in the course)
  • All the course content (units consisting of XBlocks, plus any files/images/textbooks etc.)
  • Course authoring team ( + author bios, if those should appear on the course about page)
  • Default relative due dates (no absolute dates, but things like "By default, the section 'Week 2' is due two weeks after course start date")
  • Content groups
  • License (either a creative commons license or "All Rights Reserved")
  • "Field Rules" that affect XBlock behavior
    • This is a list of rules that override the default or value of any XBlock Scope.settings/Scope.content fields, e.g. "set max_attempts default to 3 for all 'problem' XBlocks in this course"

A course run consists of the following data, which is not included in the course template:

  • Name of the course run
  • Start date, end date, due dates
  • Cohort definitions/settings
  • Grading policy (the course template can include a default policy, which the run may or may not override)
  • Instructors and TAs ( + Instructor bios and TA bios, if those should appear on the course about page)
  • Enrollment restrictions
  • Customizations to the course outline and content ( add/remove chapters, subsections, and units )
  • "Field Rule Overrides" that affect XBlock Behavior
    • e.g. "set max_attempts default to  for all 'problem' XBlocks in Introduction chapter"

Most courses will have one template with one run, but many use cases (that use the CCX feature today) will see multiple runs per course template.

Rationale: Splitting course authoring into two phases allows a more clear separation between the "course author" role and the "course instructor" role; for example, today there is no way to grant a user permission to edit a course in Studio without also giving them access to view data about all students who are enrolled in the course. This design also removes the need to treat "CCX" courses as a special case (essentially, this design treats all courses in a similar way to today's "CCX" courses, which improves the architecture of the system by treating all courses consistently and removing a number of special cases that currently exist to support CCX).

Outline is a first-class concept

In the current Open edX platform, a course consists of a directed acyclic graph of XBlocks, with a root "course" node, which has several "chapter" children, and so on. The "course outline" is defined by the XBlock graph. 

In this new design, the course outline is no longer an XBlock graph at all; rather it is a simple rooted tree data structure that optionally defines the sections, subsections, and units in the course, along with their metadata such as permissions, visibility and due dates. The outline is stored in a JSON file in the course's Blockstore Bundle.

Rationale: Separating the outline from any XBlock runtime allows for much more efficient understanding and querying of the course outline, and provides better separation of concerns. Applying transformations to the outline (add/remove sections for certain users/runs only, shuffle them around adaptively, etc.) becomes much easier. The system can also support new types of micro learning experiences where, for example, the "course outline" consists of a single unit that is meant to be embedded into a blog post or another course. Finally, this design will explicitly disallow units that have multiple parents, which causes some issues in the current LMS.

Transforms apply -izations

Building on the success of the "block transformers" in the current LMS, all learnable content will go through two distinct "transforms" which will each apply customizations and optimizations to the learnable content.

Courses, content, etc. will be stored in the Blockstore in a write-optimized form - course outlines will be JSON files, XBlock content will be stored as OLX (XML), and other data (instructor bios, lists of XBlock field overrides, etc.) will be stored in JSON format. Transform 1 will ingest the write-optimized content from a course template, then merge it with any applicable run-specific data (e.g. start date, content customizations), and produce a read-optimized course bundle (the "compiled course"). The compiled course may be in some format other than JSON/XML/etc. (e.g. python data structures, or protobuf ?), and will consist of the course outline and the course content will all applicable overrides (from the course template + course run settings) applied to each.

The LMS will only ever read the compiled course, and will never read the data that precedes transform 1.

While the compiled course which is output from Transform 1 can be cached until a change to the course is published, Transform 2 will be a more real-time transformation which applies user-specific changes to the compiled course. This may include checking user permissions and modifying the course accordingly, applying individual due dates, loading and assigning randomized content from a compiled content library, showing/hiding content based on content groups / cohorts, and even modifying content visibility based on the recommendations of an adaptive learning engine. The output of Transform 2 will be in the same format as the output of Transform 1.

The XBlock Runtime is isolated

The XBlock runtime will be modified to work when the course/section/subsections are no longer XBlocks, so the root of any given XBlock tree that it loads is always a "Unit".

The XBlock runtime and the LMS will save state (XBlock scope.user_state fields, ORA2 data, grades, and completion) into the "learning record store" which will be grouped by "learning context ID" instead of course ID. This enables XBlock Units to be used outside of course contexts (e.g. in blog posts or microcourses) while still being able to take advantage of all the LMS features. This also provides a straightforward migration path for the existing codebase, because any "course ID" currently used in the system can double as a "learning context ID".

In the long term, the goal will be to move the XBlock runtime into a separate process, so it no longer runs as part of the LMS process.