A straightforward, unoptimized, implementation of the Course Blocks and Navigation API requires a post-order depth-first traversal of the entire course hierarchy. It instantiates and binds user-context to each traversed xBlock and computes aggregate information along the way. This computation is very costly given our flexible xBlock and Modulestore infrastructures.
Here, we propose a design for an underlying subsystem that precomputes, collects information for, and caches (Collect Phase) the expensive course traversal, while supporting "fast" access to course blocks (Transform Phase) by means of an extensible framework of Block Tree Transformers.
Note: Although we use the term "Tree" throughout this document, the system does support DAGs for now.
Block Tree Transformers
We use the transformers design pattern to apply a change (i.e., a transformation) to a course's block tree. The transformation can be anything affecting the block tree, including updates to block fields and to the block structure (parents/children graph relationships).
For performance reasons, we split the work of a transformer into multiple phases: Collect and Transform (see below).
A pipeline (ordered list) of transformers are pre-configured in the edx-platform, so they can be called during the collect phase.
During the transform phase, a feature-specific list of transforms is applied.
To optimize further, it may be possible to cache the results after each transformation. But that is not considered at this point.
Collect Phase of a Course (instantiating xBlocks via Modulestore)
The "collect" phase is intended for caching all required data from the Modulestore through the instantiated xBlocks. Since this is a performance-hefty operation, we separate it into its own phase. Also, this approach can potentially be the bridge between a published course and an administered course if/when these (CMS and LMS) subsystems are separated.
The Collect phase may be triggered:
cold-cache access: when a Block tree's transform is requested and the course's collected cache is empty
on publish: when a Course is published
manually: through a management command
combination: by some combination of the above
Exactly which of the options above will be determined and tweaked after gathering additional performance data.
The following are options for storing the data that is accumulated during the collect phase:
persisted django cache
multi-tiered cache using some combination of the above
For now, we are using the django cache interface as the underlying cache storage for the collected data. Eventually, if performance metrics dictate, we may look at other options.
In the Collect Phase of a Course:
The entire course structure is traversed and data structures for caching the course's block tree are populated by instantiating each xBlock in the course's hierarchy through the modulestore.
All registered (sorted) Transformers are instantiated and their Collect method is called so they can add their own Transformer-required data to the cache.
Course Block Structures. A dictionary of course block structures, identified by the CourseKey, for traversing the course hierarchy. Note: We may also cache parent pointers, if the space-time tradeoff is acceptable.
Block Data. A dictionary of block data, identified by the block's UsageKey, for data (collected and updated) for each block. Note: Each transformer-specific data is kept in its own data storage (dict) keyed by a value derived from the transformer's class name. For transformers that need to collect field(s) declared on an xBlock, they can declare those in their class declaration so the framework makes them available in a shared pool denoted as _system_data below.
Transform Phase of a Block Tree (feature-specific application of transformers)
Trigger. The "transform" phase is triggered by a request from an edX service that wants to access a course's block tree.
The requesting service can specify the ordered list of transformers that should be applied to the course's block tree.
The requested transformers must be a subset of the registered transformers that were used during the collect phase. Note: This restriction may be relaxed for those transformers that only implement the "transform" method and don't require a "collect" phase.
The transformers operate on mutable block tree data structures that are extracted from the cache (created in the "collect" phase).
The requesting service may apply additional transforms, if desired, or forward the block tree to other services, which may apply their own.
def required_fields(self, ...) # returns a list of xBlock "system" Fields that are required by this transformer in the transform method
def collect(self, ...) # given a course's entire xBlock tree, collects and returns any transformation-specific data
def transform(self, ...) # given a partial block tree and previously collected data, updates the block tree according to this transformer's specification
Partial Block Tree Transformations
A transformer needs to support transformations of partial block trees during the transform phase to support features that make direct access to a block (tree). In order to support this, transformers that require full tree traversals should do so during the collect phase. Any inherited values impacting a block should be percolated down through the tree and stored as aggregate values on each descendant. For example, when the Visibility transformer visits a block in the collect phase, it stores the intersection of the visibility values of the block and all of its parents. As another example, the StartDate transformer stores the minimum of all the start dates from the block's ancestors.
Tree Traversal Tools
A common set of configurable tree traversal tools, such as a topological sort generator, are provided as tools for each transformer to use. For now, the implementation assumes a DAG block structure. If/when ever we remove support for DAGs, we can optimize this infrastructure.
If possible, each transformer should be implemented without dependence on other transformers. This allows the system and/or the requesting service to configure the transformer pipeline for optimal performance without worrying about interdependence and coupled interactions.
The framework honors a system-configured, ordered list of transformers as the designated pipeline during the collect phase.
The pipeline of transformers used during the transform phase is provided by the requesting service. This should be a subset of the transformers called during the collect phase, unless the transformer doesn't implement a collect method.
When a new transformer that requires collection is introduced into the system, all collected caches would need to be invalidated. In the future, we can possibly do better by keeping all the caches, but incrementally updating them with collected data from new transformers.