Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We use the transformers design pattern to apply a change (i.e., a transformation) to a course's block tree.  The transformation can be anything affecting the block tree, including updates to block fields and to the block structure (parents/children graph relationships). 

  • For performance reasons, we split the work of a transformer into multiple phases: Collect and Transform (see below).

  • A pipeline (ordered list) of transformers are pre-configured in the edx-platform, so they can be called during the collect phase.

  • During the transform phase, a feature-specific list of transforms is applied.

  • To optimize further, it may be possible to cache the results after each transformation.  But that is not considered at this point.

Collect Phase of a Course (instantiating xBlocks via Modulestore)

...

The Collect phase may be triggered:

  1. cold-cache access: when a Block tree's transform is requested and the course's collected cache is empty

  2. on publish: when a Course is published

  3. manually: through a management command

  4. combination: by some combination of the above

Exactly which of the options above will be determined and tweaked after gathering additional performance data.

...

The following are options for storing the data that is accumulated during the collect phase:

  • memcache

  • persisted django cache

  • elastic search

  • multi-tiered cache using some combination of the above

For now, we are using the django cache interface as the underlying cache storage for the collected data.  Eventually, if performance metrics dictate, we may look at other options.

...

In the Collect Phase of a Course:

  • The entire course structure is traversed and data structures for caching the course's block tree are populated by instantiating each xBlock in the course's hierarchy through the modulestore.

  • All registered (sorted) Transformers are instantiated and their Collect method is called so they can add their own Transformer-required data to the cache.

Data Structures

  • Course Block Structures. A dictionary of course block structures, identified by the CourseKey, for traversing the course hierarchy.  Note: We may also cache parent pointers, if the space-time tradeoff is acceptable.

    Code Block
    languagepy
    course-v1:org+course+run: {
    	root: usage_key
        blocks: {
    		usage_key_1: {children: []}
    		usage_key_2: {children: []}
    	}
    }

  • Block Data. A dictionary of block data, identified by the block's UsageKey, for data (collected and updated) for each block.
    Note: Each transformer-specific data is kept in its own data storage (dict) keyed by a value derived from the transformer's class name.  For transformers that need to collect field(s) declared on an xBlock, they can declare those in their class declaration so the framework makes them available in a shared pool denoted as _system_data below.

    Code Block
    usage_key_1: {
    	_system_data: {}
    	transformer_a_data: {}
    	transformer_b_data: {}
    }

Transform Phase of a Block Tree (feature-specific application of transformers)

  • Trigger. The "transform" phase is triggered by a request from an edX service that wants to access a course's block tree.

  • Transformers.

    • The requesting service can specify the ordered list of transformers that should be applied to the course's block tree.

    • The requested transformers must be a subset of the registered transformers that were used during the collect phase. 
      Note: This restriction may be relaxed for those transformers that only implement the "transform" method and don't require a "collect" phase.

  • Transitive.

    • The transformers operate on mutable block tree data structures that are extracted from the cache (created in the "collect" phase).

    • The requesting service may apply additional transforms, if desired, or forward the block tree to other services, which may apply their own.

Transformer Implementation

...

If possible, each transformer should be implemented without dependence on other transformers.  This allows the system and/or the requesting service to configure the transformer pipeline for optimal performance without worrying about interdependence and coupled interactions.

Pipeline

  • The framework honors a system-configured, ordered list of transformers as the designated pipeline during the collect phase.

  • The pipeline of transformers used during the transform phase is provided by the requesting service.  This should be a subset of the transformers called during the collect phase, unless the transformer doesn't implement a collect method.

  • When a new transformer that requires collection is introduced into the system, all collected caches would need to be invalidated.  In the future, we can possibly do better by keeping all the caches, but incrementally updating them with collected data from new transformers.

List of Transformers

See list of transformers.