Course Block Transformers

A straightforward, unoptimized, implementation of the Course Blocks and Navigation API requires a post-order depth-first traversal of the entire course hierarchy.  It instantiates and binds user-context to each traversed xBlock and computes aggregate information along the way.  This computation is very costly given our flexible xBlock and Modulestore infrastructures. 

Here, we propose a design for an underlying subsystem that precomputes, collects information for, and caches (Collect Phase) the expensive course traversal, while supporting "fast" access to course blocks (Transform Phase) by means of an extensible framework of Block Tree Transformers.

Note: Although we use the term "Tree" throughout this document, the system does support DAGs for now.

Block Tree Transformers

We use the transformers design pattern to apply a change (i.e., a transformation) to a course's block tree.  The transformation can be anything affecting the block tree, including updates to block fields and to the block structure (parents/children graph relationships). 

  • For performance reasons, we split the work of a transformer into multiple phases: Collect and Transform (see below).
  • A pipeline (ordered list) of transformers are pre-configured in the edx-platform, so they can be called during the collect phase.
  • During the transform phase, a feature-specific list of transforms is applied.
  • To optimize further, it may be possible to cache the results after each transformation.  But that is not considered at this point.

Collect Phase of a Course (instantiating xBlocks via Modulestore)

The "collect" phase is intended for caching all required data from the Modulestore through the instantiated xBlocks.  Since this is a performance-hefty operation, we separate it into its own phase.  Also, this approach can potentially be the bridge between a published course and an administered course if/when these (CMS and LMS) subsystems are separated.

Trigger

The Collect phase may be triggered:

  1. cold-cache access: when a Block tree's transform is requested and the course's collected cache is empty
  2. on publish: when a Course is published
  3. manually: through a management command
  4. combination: by some combination of the above

Exactly which of the options above will be determined and tweaked after gathering additional performance data.

Storage

The following are options for storing the data that is accumulated during the collect phase:

  • memcache
  • persisted django cache
  • elastic search
  • multi-tiered cache using some combination of the above

For now, we are using the django cache interface as the underlying cache storage for the collected data.  Eventually, if performance metrics dictate, we may look at other options.

Collected Data

In the Collect Phase of a Course:

  • The entire course structure is traversed and data structures for caching the course's block tree are populated by instantiating each xBlock in the course's hierarchy through the modulestore.
  • All registered (sorted) Transformers are instantiated and their Collect method is called so they can add their own Transformer-required data to the cache.

Data Structures

  • Course Block Structures. A dictionary of course block structures, identified by the CourseKey, for traversing the course hierarchy.  Note: We may also cache parent pointers, if the space-time tradeoff is acceptable.

    course-v1:org+course+run: {
    	root: usage_key
        blocks: {
    		usage_key_1: {children: []}
    		usage_key_2: {children: []}
    	}
    }
  • Block Data. A dictionary of block data, identified by the block's UsageKey, for data (collected and updated) for each block.
    Note: Each transformer-specific data is kept in its own data storage (dict) keyed by a value derived from the transformer's class name.  For transformers that need to collect field(s) declared on an xBlock, they can declare those in their class declaration so the framework makes them available in a shared pool denoted as _system_data below.

    usage_key_1: {
    	_system_data: {}
    	transformer_a_data: {}
    	transformer_b_data: {}
    }

Transform Phase of a Block Tree (feature-specific application of transformers)

  • Trigger. The "transform" phase is triggered by a request from an edX service that wants to access a course's block tree.
  • Transformers.
    • The requesting service can specify the ordered list of transformers that should be applied to the course's block tree.
    • The requested transformers must be a subset of the registered transformers that were used during the collect phase. 
      Note: This restriction may be relaxed for those transformers that only implement the "transform" method and don't require a "collect" phase.
  • Transitive.
    • The transformers operate on mutable block tree data structures that are extracted from the cache (created in the "collect" phase).
    • The requesting service may apply additional transforms, if desired, or forward the block tree to other services, which may apply their own.

Transformer Implementation

class BlockTreeTransformer(object):
	@property
	def required_fields(self, ...)  # returns a list of xBlock "system" Fields that are required by this transformer in the transform method
	def collect(self, ...)  # given a course's entire xBlock tree, collects and returns any transformation-specific data 
	def transform(self, ...)  # given a partial block tree and previously collected data, updates the block tree according to this transformer's specification

Partial Block Tree Transformations

A transformer needs to support transformations of partial block trees during the transform phase to support features that make direct access to a block (tree).  In order to support this, transformers that require full tree traversals should do so during the collect phase.  Any inherited values impacting a block should be percolated down through the tree and stored as aggregate values on each descendant.  For example, when the Visibility transformer visits a block in the collect phase, it stores the intersection of the visibility values of the block and all of its parents.  As another example, the StartDate transformer stores the minimum of all the start dates from the block's ancestors.

Tree Traversal Tools

A common set of configurable tree traversal tools, such as a topological sort generator, are provided as tools for each transformer to use.  For now, the implementation assumes a DAG block structure.  If/when ever we remove support for DAGs, we can  optimize this infrastructure.

Independence

If possible, each transformer should be implemented without dependence on other transformers.  This allows the system and/or the requesting service to configure the transformer pipeline for optimal performance without worrying about interdependence and coupled interactions.

Pipeline

  • The framework honors a system-configured, ordered list of transformers as the designated pipeline during the collect phase.
  • The pipeline of transformers used during the transform phase is provided by the requesting service.  This should be a subset of the transformers called during the collect phase, unless the transformer doesn't implement a collect method.
  • When a new transformer that requires collection is introduced into the system, all collected caches would need to be invalidated.  In the future, we can possibly do better by keeping all the caches, but incrementally updating them with collected data from new transformers.

List of Transformers

See list of transformers.