The proposals:
- Harvard LabXchange Blockstore Proposal (Original)
- File-Oriented Blockstore Proposal (File)
- Blockstore Implementation Proposal (Database)
The Blockstore proposals are fairly long, so this doc tries to summarize some of the principle differences.
Collections
- All proposals center ownership, permissions, and licensing around the concept of a Collection.
- Each piece of content belongs to exactly one Collection.
- Examples:
- single course (possibly multiple runs)
- problem bank
- library of videos created by a video team
Differences: Collection Versioning
Original and Database Proposals
- Collections point to versioned content, and are also versioned as a whole.
File Proposal
- Collections point to versioned content, but the Collection itself is not versioned.
Content Primitives
- Identified by UUID.
- Versioned numerically (1, 2, 3, etc.)
- Tagging metadata is stored outside of the core Blockstore.
Differences: OLX vs. Assets
Original and Database Proposals
- Separate models for Unit (i.e. a Studio Unit) and Files/Assets (e.g. images, PDFs, video files)
- Unit OLX content is stored in the database.
- Files/Assets live in an object store like S3, and are pointed to by rows in the database.
- All metadata about Units and Assets are stored in the database.
- Assets used by a Unit are tied together in the database using Links.
- Advantages
- Access to OLX has better latency guarantees, particularly for multi-gets.
- Transactions make it easier to guarantee atomic operations involving many Units/Links/etc.
- Able to track usage at a fine granularity (e.g. what are all the places this exact version of this image is used?) without requiring external indexing like Elasticsearch.
File Proposal
- All content is stored in Content Bundles, which is like a small directory of files.
- The OLX for a Unit would go into an XML file in a ContentBundle.
- All Bundle content is stored in an S3-like object store.
- Metadata about what content constitutes a particular version is in the object store, not the database.
- Assets used by the Unit would go into the same ContentBundle.
- Advantages
- Units are more self contained.
- Easier to adapt for use cases outside of Open edX, since ContentBundles don't assume an OLX/Assets divide.
- Easier to associate bundles of related Assets, like a Video's various encodings, subtitles, thumbnails, etc.
- Cheaper storage.
Differences: Granularity and Versioning
Original Proposal and Database Proposal
- Files/Assets are tracked individually.
- Units are tracked individually.
Original Proposal
- In addition to per-Unit and per-File tracking, ContentSets (a group of Links) are also versioned.
File Proposal
- ContentBundles are versioned as a whole, not individual assets inside them.
- Depending on the intended usage, a ContentBundle could be a single video, a Unit, or an entire Sequence.
Differences: Modeling Sequences and Courses
Original Proposal
- ContentSets are collections of Links that point to Units, Files, or other ContentSets.
- Statically defined Sequences and Courses are composed using ContentSets.
Database Proposal
- Sequences are out of scope – Blockstore's job is to provide fast access to the Units for a separate Compositor service.
File Proposal
- A statically defined Sequence is modeled as a single ContentBundle, and versioned as a whole.
- A Course would be a ContentBundle with a root OLX file defining the chapters and a set of Links to Sequences.
Links
- Links are versioned in all proposals.
- Conceptually like symlinks.
Differences: Scope of Usage
Original Proposal
- Links are used to tie together Units and Files.
- ContentSets tie together Units with each other, as well as with Files and other ContentSets.
- Units, Files, and ContentSets are all considered "Linkables", and share a common interface that includes version history, tags, and draft status.
- Links are stored in the database.
Database Proposal
- Links are used to tie together Units and Files only.
- Links are stored in the database.
File Proposal
- Links are used a lot less, because Units and Sequences typically contain their own assets within the same ContentBundle.
- A shallow, versionless representation of Links exists in the database for notification purposes, but full Link information is stored in the object store.
- This is for scaling and performance reasons when dealing with large numbers of links and extended dependencies.
- This makes it much harder to find out which things are using a specific Version of a given piece of content unless we index separately with something like ES.
Differences: Garbage Collection
Original and Database Proposals
- Use Links in the database to garbage collect content that is outdated and is no longer being referenced.
File Proposal
- Don't garbage collect.
- Versioned OLX content is relatively small compared to the size of other assets stored in the object store.
- It's not clear how we'd know what was being used in a multi-site distributed sharing arrangement.
Search & Tagging
None of the proposals really addresses this, but all of them assume that there will be an external system (either a plugin or separate service) that uses ElasticSearch as a backend.