Content Libraries Architecture

Feature Overview

A content library is a collection of course content (XBlocks) that can be used in one or more courses.

For more information, see the overview of content libraries in the Open edX documentation.

The original description/goals set out before any development began are at https://openedx.atlassian.net/wiki/display/SOL/Content+Libraries and the original implementation epic was SOL-119.

Architecture

Content libraries are not implemented in any one part of the code, but there are some key pieces that together comprise the feature:

  1. Split Modulestore, which was modified to support content libraries. Instead of just storing "course" structures, Split Modulestore now has the concept of a "course-like" structure, which is either a course or a library.
    • Courses and libraries are implemented similarly, each with a directed acyclic graph (DAG) of XBlocks, and a history of all changes made. The graph of blocks for both courses and libraries are stored in the "structures" MongoDB collection.
    • Split's "active_versions" MongoDB collection stores a list of all course-like objects (courses and libraries). Each one has an ("org", "course", "run") triplet which is the unique ID of that course-like object. In that triplet, "org" is the organization that published the course-like object, and "course" is the field that stores the name ID value of the course-like object, which may be a library name; e.g. if the library's ID is "library-v1:UniversityX+LIB100", then "org" is "UniversityX" and "course" is "LIB100". Since libraries do not have the concept of "runs", the "run" value of a library is always set to "library." The other difference between courses and libraries in this collection is that courses have a "version" object that contains a "draft-branch" and/or "published-branch" entries (that point to the current version of the course DAG in the "structures" collection), whereas libraries only have a "library" entry (which points to the current version of the library DAG in the same "structures" collection).

      Screenshot of an "active_versions" collection, showing MongoDB documents for both a library and a course:
       
    • Any XBlock's Scope.content field values will be stored in split's "definitions" collection. A single definition document may be shared among any uses of that same XBlock in libraries and courses.
    • Documents in the split modulestore are considered immutable; any changes to content in a course or library result in a new version of the definition and/or structure object being created, and the active_versions record is then updated to point to the new versions. (This is just a note - this aspect of the modulestore was not changed when content libraries were added.)
       
  2. The Library Root XBlock, a simple structural XBlock analogous to the "course" XBlock. This XBlock is the root of each content library. It is found in the platform code at common/lib/xmodule/xmodule/library_root_xblock.py. This block is just a container and does not have any user-visible functionality.
     
  3. "LibraryLocator" and "LibraryUsageLocator" opaque keys, which are identifiers used to uniquely identify libraries and the XBlocks that are contained within a content library. - https://github.com/edx/opaque-keys/pull/46
     
  4. Studio support for editing content librarieshttps://github.com/edx/edx-platform/pull/6046
     
  5. Studio Import/Export code allows exporting and importing content libraries as XML - https://github.com/edx/edx-platform/pull/6846
     
  6. The Randomized Content Block, an XModule that is used to show content from a library to students. In the code, the block is called the Library Content Module because the original intention was for it to support two "modes": randomized content and manually selected content. Since the latter mode was never implemented, the module's display name was changed to "Randomized Content Block."
     
  7. LibraryToolsService, an XBlock runtime service that provides some functionality that the Randomized Content Block (LibraryContentModule) needs in order to function, such as:
    • list_available_libraries(): used in the Randomized Content Block settings UI to allow the user to select which content library to draw content from.
    • get_library_version(): given a library ID, returns the current version number of that library.  Used to determine when new/updated content has been added to a library.
    • update_children(): described below in "How Components Are Copied Into the Course"

  8. ContentLibraryTransformer, a block transformer which allows Library Content modules to work with the Course Blocks API.

How Components Are Copied Into the Course

When users want to insert content from a library into a course, they first need to add a Randomized Content Block to the course, then edit that block's settings to select a library to source content from. Authors can also specify settings such as how many components from the library to show to each learner, and what types of components to select from the library (e.g. select only multiple choice problems).

Once the author has selected a library and other settings, the Randomized Content Block uses the LibraryToolsService's update_children() method, which will copy every matching XBlock from the library and add it as a child of the Randomized Content Block. This means that the course will contain a complete copy of each XBlock sourced from the library (although any Scope.content values are stored in documents in split's "definitions" table, and those documents do not get copied - they are shared by the course and the library). The actual copying of the block data is done by the split modulestore's copy_from_template() method.

When blocks are copied from the library to the course structure, each block is assigned a new block ID. As explained in the code, this is necessary because one library block could be copied as a child of multiple Randomized Content Blocks within the same course (and each usage needs a unique ID). The new block ID is generated from three pieces of information through a one-way transform: (1) the Library ID (LibraryLocator), (2) the ID of the block in the library (part of the LibraryUsageLocator), and (3) the ID of the LibraryContentModule where the problem is going to be used. This process ensures that the same block from a library can be used in multiple places in a course (if desired), and that the new block ID won't change if the library block is updated and then re-copied into the course.

The split modulestore provides a method called get_block_original_usage() which can be used to get the original library block ID (LibraryUsageLocator) given any course-usage-specific block ID. Internally, this original ID value is stored in the course's "structures" collection in the "original_usage" property of the "edit_info" object for that block. This could be used, for example, to aggregate all learner's scores for a particular library problem across all courses that it is used in.

Screenshot of a course's structure document, showing the edit_info of an HTML block sourced from a library:

Split modulestore changes the "version" of a course or library whenever any change is made. When content is copied from the library, the Randomized Content Block stores the current version of the library used in its source_library_version field. Whenever the Randomized Content Block is shown in Studio, it uses the LibraryToolsService's get_library_version() method to check if the source library has been changed (in which case, the result of get_library_version() will not match self.source_library_version). If the library has changed, the author will be prompted to update the Randomized Content Block, which will overwrite each child of the Randomized Content Block with the latest version from the library:

Updating a Randomized Content Block will also delete any child components which were deleted in the new version of the library.

Settings Overrides

When a Randomized Content Block is present in a course, authors can use the "View" button to view the child components sourced from the library:

From the resulting view of the blocks that have been sourced from the library, authors can click "Edit" on any component, to add course-specific overrides to the block. For example, authors can change the "weight" of a capa problem, and the changes will apply only to that course.

This feature is called "settings overrides" and it is meant to be used only for Scope.settings fields. Settings that have a course-specific override set will appear in blue; clicking the "reset" button to the right of these fields will remove the override and restore the field value from the library. Settings that are using the default value from the library version will appear faded out.

In the split modulestore's "structures" collection, Scope.settings field values for a block sourced from a library are stored in the "defaults" object and any Scope.settings override values specified by a course author are stored in the "fields" object.

Screenshot of the "structures" collection, showing the "defaults" values and "fields" values of an HTML component that is sourced from a library:

If the library is updated, then the Randomized Content Block in Studio will prompt the author to "Update now" with new content from the library (see "How Components are Copied Into the Course"). Updating will preserve any Scope.settings overrides that exist in the course.

The UI currently allows authors to modify Scope.content field values of components sourced from a library as well. Such Scope.content field changes only affect the block as seen in that particular usage (that place in that course), and do not affect the original library component nor other courses/LibraryContentModules that use the same component. Additionally, changes to any Scope.content fields will be lost when the Randomized Content Block is updated (when it replaces its children with the latest versions sourced from the library). See "Future direction" below for how this could be improved.

How Components are Randomized

When a learner views a Randomized Content Block in a course, the LMS calls the Randomized Content Block's get_child_descriptors() method, which is responsible for determining which subset of components to show to that particular learner. Recall that all the possible blocks from the library have been copied into the course and exist as children of the Randomized Content Block; this means that get_child_descriptors() is responsible for "filtering" the children, so that only N children will be shown to the learner, where N is usually 1 but can be customized by the course author. The IDs of the blocks that were randomly selected for each learner are saved into the Randomized Content Block's "selected" field. For details on how this selection is made and what happens if the library is updated, or the Randomized Content Block settings are changed in a way that affects the selection, refer to the source code of make_selection(), which is well-commented.

Tracking Log Events

Tracking log events are emitted whenever a particular student is randomly assigned content from a content library, as well as any time that selection had to be changed (e.g. if a block was deleted). These events are documented at http://edx.readthedocs.io/projects/devdata/en/latest/internal_data_formats/tracking_logs.html#library-interaction-events

Future direction and technical debt

  1. When the Course Blocks API is used (e.g. for mobile, the Progress page, and the new grading code) to retrieve the structure of a course that includes Randomized Content Blocks, if the randomized selection of a problem has not yet occurred (because the learner has not viewed that unit in the LMS), then a new selection is made for the current learner but is not saved, and will change every time. This is because the Course Blocks API does not allow XBlocks to save changes they've made to their fields, and the Randomize Content Block saves the selection into an XBlock field. (See this TODO in the code.)
  2. Libraries currently cannot store assets (e.g. images), which is a big limitation. (See also GridFS Replacement.)
  3. When editing a component that was sourced from a library, authors cannot tell which fields are Scope.content fields (changes will be lost when updating the parent Randomized Content Block with the latest version of the library) and which fields are Scope.settings fields (changes are considered course-specific overrides and are preserved when updating the Randomized Content Block). The Scope.content fields should be disabled and authors should be prevented from editing them within the course.
  4. We need a new XBlock like the randomized content module, but allowing manual selection of one or more components from the library, instead of random selection. This was part of the original plan but was cut from the MVP.
  5. We need a way to tag content in the library (e.g. align with a taxonomy) and then have the randomized content block only draw problems that match certain criteria (difficulty, topic, etc.). This can also be the basis of adaptive learning features.
  6. Currently, content libraries support a very limited subset of XBlocks. More types of XBlocks should be tested and enabled for use in content libraries.
  7. The Library Content XModule (Randomized Content Block) should be converted to an XBlock. This is currently not possible because it depends on the following methods which are part of the XModule API but not the XBlock API:
    • get_child_descriptors()
    • has_dynamic_children()
    • get_content_titles()
  8. Libraries do not currently have a draft/published workflow, though the basic support for that exists in the split modulestore, analogous to courses.
  9. Libraries can support nested structures and can hold chapters, sections, units, etc. However the studio UI does not provide a way to do this. It could be interesting to explore use cases where authors have access to a library of course units or chapters and can build new courses by combining existing units or entire chapters that are sourced from a library.
  10. Enable search and filtering of content library content in the Studio UI and the Randomized Content Block UI
  11. If a course contains two randomized content blocks that each select one problem from the same library, there is a chance the student will have to do the same problem twice. It would be cool to prevent such duplicate random selections from happening, but is likely not worth the trouble.
  12. Brian Wilson suggested: Eventually, "research and course design teams may wish to be able to have access to scores, aggregated on a single [library component] across courses."
    • This may be easier to implement once the Robust Grades work is completed.
    • The tracking log events emitted by any blocks that were sourced from a content library already emit a "context" section that includes the original_usage_key and original_usage_version fields, which are required to identify library components across courses (see documentation).
  13. Authors should be able to move/copy a component from a course page into a content library.
  14. Support for external content libraries could allow a central repository of content, used by multiple Open edX instances.
  15. There seems to be a bug in XModuleMixin's location.setter: it sets def_id and usage_id to the same UsageKey value, but def_id should not be a UsageKey and should not be the same as usage_id in general. The code should use .runtime.id_reader to get the definition key. It's unclear why this bug isn't causing more problems, or if fixing it will cause any issues - this needs investigation.
  16. TNL-5947 - Getting issue details... STATUS