Blockstore Design
This is the most current Blockstore design document, but many details continue to be refined in conversations on the Issues page of the Blockstore repo. Some of these topics still under debate are:
- How granular is a Bundle in different use cases (i.e. single problem, entire unit, outline of entire course, etc.)?
- Exactly what files get placed where inside a Bundle?
- What does the import/export look like for courses and content libraries?
This is the design document for Blockstore, a system for authoring, discovering, and reusing educational content. Development is being funded by Harvard LabXchange and the Amgen Foundation, with significant in-kind contributions from edX.
Abstract
All lesson content in the Open edX platform is currently stored in the modulestore, which requires that all content is organized into “courses” that are each a directed acyclic graph (DAG) of XBlocks/XModules (or in “libraries” which are implemented in the same way as courses, but which have a shallower graph and support a limited set of content types).
This proposal outlines a design for a new service that stores content for the Open edX platform, called “Blockstore.” Blockstore is meant to be a lower-level service than the modulestore, and it is designed around the concept of storing small, reusable pieces of content, rather than large, fixed content structures such as courses. In other systems and academic contexts, these are often called “learning objects,” and Blockstore is thus a type of Learning Object Repository (LOR). For Open edX, Blockstore is designed to facilitate a much greater level of content re-use than is currently possible, enable new adaptive learning features, and enable delivery of learning content in new ways (not just large traditional courses).
Motivation
At its heart, edx-platform's current modulestore works with large, static course structures. Various dynamic courseware features such as A/B tests, cohorts, and randomized problem banks work around this by copying every piece of content that might be displayed to any user and then selectively showing a subset of that using permission access checks. When you use a randomized problem bank in a sequence, the system is in fact copying the entire content library into that sequence.
This poses a number of problems:
- It creates very large data structures, degrading courseware performance. Many common courseware interactions noticeably slow down as the amount of content in a course increases.
- The underlying structure is static, so the ordering of elements is fixed, making adaptive learning sequences extremely cumbersome to implement. Course teams have heroically worked around this using LTI hacks, using Open edX as both an LTI provider and consumer in chained LTI launches (sequences with one unit that acts as an LTI consumer to an adaptive engine interface that then becomes an LTI consumer for individual problems in the original course).
- Course content is largely duplicated for every run, making it cumbersome to manage across multiple runs, especially if those runs are on different instances of Open edX as is the case with some partners.
- Trying to work around these limitations and maintain performance has significantly complicated the codebase and slowed feature development. Content Libraries are far less powerful than they were intended to be because of the large infrastructure changes that would have been required to execute the original vision.
General Themes / Concepts
The high level ideas that ground this proposalare:
- Blockstore stores data in Content Bundles, which are a local grouping of files that Blockstore knows little about.
Blockstore doesn't understand much about the things inside of it. There is no special data structure within the core of Blockstore for Sequences vs. Units vs. anything else. OLX content, smaller assets like images, and larger assets like videos are all stored as files in Content Bundles, using conventions and groupings that make sense to the client application. A separate plugin layer will be able to listen to and take action for particular types of Content Bundles. - Blockstore is a lower level storage abstraction that XBlocks (and other clients) build upon.
We will compose Blockstore primitives in various ways to store content, but there isn't a 1:1 mapping of concepts. For instance, a Collection is not equivalent to a Course or a Library. A Collection might in fact store multiple Course Runs and multiple Library equivalents. A Content Bundle might be used to store a Sequence, an individual Problem, or the outline of a Course Run. The concrete primitives that Blockstore offers are versioned storage and the ability to access files in other Bundles using Links. This gives us a lot of flexibility, but requires us to be disciplined about how we use it. - Blockstore represents author intent and grouping. It favors author-friendliness even if it makes certain bookkeeping harder.
A Content Bundle in Blockstore is something an author wants to edit, version, import, and export as a single thing. That means a Bundle can be a single problem or an entire sequence. Things stored in Blockstore are not read-optimized, and are not the data structure that students interact with in the end. The definitions of a mostly static learning sequence and a learning sequence with an adaptive component might look completely different when stored in Blockstore, even if the Learner eventually experiences them in a similar way. The imported and exported bundle that is a Content Bundle should be as author-friendly as possible – assets are grouped together with where they're used, and as few Blockstore concepts as possible should leak into how the content is written. - Versioned content is the core of Blockstore, and plugin extensibility is focused around annotating that content.
Things that create, transform, update, or execute content live outside Blockstore. Plugins know when content has been changed, but they don't modify content. Plugins maintain their own data and APIs. Plugin data changes can happen outside of the lifecycle of the content itself. This means that an export of the same version of a Content Bundle will always yield the same authored content, but may yield different plugin metadata (example: new tags that were added). Also, versions are meaningful, and not every edit of every file spawns a new version. A version is like a "commit" in that sense.