Content Libraries v1 Behavior / Storage Implementation
Context: This was written as a response to https://openedx.atlassian.net/browse/TNL-7454 , in which v1-library-sourced block settings were accidentally lost due to a change in their import handling.
The implementation of the original version of content libraries makes the most sense if you frame the development effort as, “What is the fastest way we can develop this, given the code that has already been written for courses?” After all, courses already stored, imported, exported, and rendered XBlocks. Content libraries tried to add just enough on top of those systems to get the intended effect. Which means that, under the covers, content libraries look very much like really small courses.
Library and Course Storage Format
Content Libraries and Courses both store all settings-scoped XBlock fields in structure documents in MongoDB, with each document holding all settings-scoped fields for the entire library or course run. These are stored in the modulestore.structures
collection. Structure documents are immutable–each one represents a different version. Courses have two branches of structure documents: one for drafts and one for the published versions (used by Studio and LMS, respectively). Libraries only have one branch called “library”. Old historical structure documents that are no longer used by the Studio or LMS get periodically removed, to save storage space.
In addition to storing the value of all settings-scoped fields, the structure documents also store the IDs for definition documents for the content of each individual problem. The XBlock fields in question are:
ProblemBlock
specific ones: https://github.com/edx/edx-platform/blob/master/common/lib/xmodule/xmodule/capa_base.py#L128-L269Ones mixed in for all block types to inherit (start dates, due dates, etc.: https://github.com/edx/edx-platform/blob/master/common/lib/xmodule/xmodule/modulestore/inheritance.py#L34-L233
Whenever a field is defined with scope=Scope.settings
, it ends up in the giant structure document. When a field is defined with scope=Scope.content
(e.g. problem text, list of inputs, correct answers–all of which are stored under the data
field in ProblemBlock
), it gets stored on a per-block/module basis in definition documents (modulestore.definitions
collection in MongoDB). The original split between content and settings fields was intended to facilitate exactly this kind of re-use, where a piece of content is stored separately from the settings associated with its use in a particular course. Unfortunately, it’s not this clean in practice, because fields that were added later that probably should have been content-scoped ended up being settings-scoped instead (e.g. markdown
, source_code
, use_latex_compiler
).
What happens when you add a LibraryContentBlock to a Course?
Say we have a Library with exactly one problem in it. When we add a LibraryContentBlock pointing to that library into a Course for the first time, it creates a new structure document in the draft branch of that Course. This new structure document has two new entries in its list of blocks
: one for the library_content
module itself, and one for the problem
(ProblemBlock) from the Library.
The library_content
module in the Course’s structure document looks like this:
{
"block_type" : "library_content",
"edit_info" : {
"original_usage_version" : null,
"previous_version" : ObjectId("5f651b216cb7f7a0c2122b4f"),
"update_version" : ObjectId("5f651b2e6cb7f7a0c2122b50"),
"edited_by" : 2,
"source_version" : ObjectId("5f651b216cb7f7a0c2122b4f"),
"original_usage" : null,
"edited_on" : ISODate("2020-09-18T20:40:14.230Z")
},
"definition" : ObjectId("5f651b106cb7f7a0c2122b4d"),
"asides" : {
},
"fields" : {
"source_library_id" : "library-v1:DaveX+2020-09-18",
"children" : [
[
"problem",
"e9e719868536e9418816"
]
],
"source_library_version" : "5f651a466cb7f7a0c2122b34"
},
"block_id" : "82dea980b8c443abbd99f7b588f769c5",
"defaults" : {
}
}
The block_id
maps to the last part of the UsageKey
for the LibraryContentBlock
. So this would entry would have a UsageKey
of block-v1:DaveX+LibraryTesting+2020-09-18+type@library_content+block@82dea980b8c443abbd99f7b588f769c5
(deriving the first part from the course key).
The definition
doesn’t have anything actually interesting in it, since LibraryContentBlock
stores all its fields in the settings scope (again: only content-scoped fields end up in the definition
documents).
The interesting bit here are the fields
, which map to the settings scoped fields source_library_id
and source_library_version
which are also attributes we see in the OLX for this library. The source_library_version
is the ObjectId
(MongoDB identifier) for the structure document that represents the exact version of the Library that this LibraryContentBlock
is referencing.
The children
field has a list of all the problems that we’re using from the library. This maps to the OLX we see in the Course’s export:
<library_content source_library_id="library-v1:DaveX+2020-09-18" source_library_version="5f651a466cb7f7a0c2122b34">
<problem url_name="e9e719868536e9418816"/>
</library_content>
The LibraryContentBlock
is using the same sort of container-like mechanisms that Units (VerticalBlock
) use to render themselves. When the LibraryContentBlock
is asked to render its student_view
to display to students in the LMS, it’s going to check its settings and the user’s state and decide to render some number of its child problem blocks based on that.
The XBlock runtime doesn’t know how to arrange it so that a block in one structure/course has child elements from other courses. It would be both a performance headache because of how large structure documents are, as well as a security headache because so much of our permissions are course/library based. So the problem
block ID that this LibraryContentBlock
is referencing is not the problem
as it is stored in the source Library (where it has a block_id
of 93120d90875545ff87da76fc2484f209
). The LibraryContentBlock
is making a child reference to the Course’s copy of the Library’s problem.
The problem data in the Library’s structure document looks like this:
{
"block_type" : "problem",
"edit_info" : {
"original_usage_version" : null,
"previous_version" : ObjectId("5f6516636cb7f7a0c2122b30"),
"update_version" : ObjectId("5f6516926cb7f7a0c2122b31"),
"edited_by" : 2,
"source_version" : null,
"original_usage" : null,
"edited_on" : ISODate("2020-09-18T20:20:34.123Z")
},
"definition" : ObjectId("5f6516636cb7f7a0c2122b2f"),
"asides" : {
},
"fields" : {
"attempts_before_showanswer_button" : 2,
"weight" : 10,
"showanswer" : "always",
"display_name" : "Library Title for Problem",
"markdown" : "You can use this template as a guide to the simple editor markdown and OLX markup to use for multiple choice problems. Edit this component to replace this template with your own assessment.\n\n>>Add the question text, or prompt, here. This text is required.||You can add an optional tip or note related to the prompt like this. <<\n\n( ) an incorrect answer\n(x) the correct answer\n( ) an incorrect answer\n",
"max_attempts" : 5,
"rerandomize" : "always"
},
"block_id" : "93120d90875545ff87da76fc2484f209",
"defaults" : {
}
}
The same problem data in Course’s structure document looks like this:
I won’t go into all the details, but some things worth noting:
Shared Definitions
The text of the problem and the input types, response types, etc. is not in either of these structure documents. It’s stored in the definition
document, and the structure block entry for this problem in both the Library and the Course point to the same definition: ObjectId("5f6516636cb7f7a0c2122b2f")
Block ID (and UsageKey) Generation
The two block_id
entries for this Problem are:
e9e719868536e9418816
– Course93120d90875545ff87da76fc2484f209
– Library
If they’re both machine generated, why is the Library’s Problem block_id
longer? It looks like the Course version of the Problem gets its block_id
set by hashing the Library’s source block_id
and the Course’s destination LibraryContentBlock
.
Fields vs. Overrides
The Library stores the settings scoped fields like display_name
and weight
in its fields
. The Course’s problem has nothing currently listed in its fields
, aside from the empty children
(ProblemBlock
can never have children, it’s just that all XBlocks store that field, even if it’s always empty). Instead, the Course’s version of the problem copies all those Library-set fields into the Course’s problem’s defaults
dictionary. The fields
in the Course version of the problem is reserved for overrides.
Overall, this is good news, because that means that this settings data exists in a way that’s associated with the course. As I mentioned before, this makes things much more predictable from a security and performance standpoint. Making this copy also means that the Course is better insulated from changes made to the Library, like it being deleted or having older versions pruned to save space.
Having a clear separation of things that were specified in the Library vs. overrides set by the Course also makes things much cleaner from a tracking/reasoning point of view in the data. So that’s great on that point as well.
There are a couple of important caveats though:
Missing Markdown
It can’t copy over themarkdown
for a problem because while that field is declared in thesettings
scope, it should be acontent
scope. In particular, editing it will try to update the value of the problem’s XML, which is stored ascontent
scope. So you can edit it, but you can’t meaningfully override the setting without also editing the content–and remember, the Library and Course are still pointing to the same definition document. Making it so that the definitions are separable is a lot more work, so the hack was to always strip the value formarkdown
when copying into a course.Missing in OLX
Thedefault
values in the Course’s version of the problem do not export in OLX. Only the overriddenfield
values get exported. This is part of the reason why the import process was modified to refresh those values from the library. If you exported from one course and into another, there would be no other way to get at the Library-set defaults, because the OLX wouldn’t carry over that data. Unfortunately, this was kind of a giant hammer that always grabs the latest version from the library for that metadata. On the bright side, I believe that the giant hammer-that-ignores-versions means that if you copy the library to the new instance first, things should “work”.
Could we just always export the defaults as fields, to carry that data across different instances of Open edX? Maybe. I’ll write more about that in the “where do we go from here” part of this multi-post. But the complicating issues are that:
There is no native distinction in OLX between defaults and overrides.
There are import/export roundtrip scenarios that might lead to surprising issues/ambiguities (e.g. we may lose the ability to revert to library settings, which is currently a feature of
ContentLibraryBlock
).There’s a weird interplay between defaults and inherited values. XBlocks have a notion of default values vs. what’s set on the XBlock explicitly, but the default values are almost always either inherited or coming from whatever the field default is declared to be in code. We special case
InheritingFieldData
to make the default come straight from thedefaults
part of the course’s structure document entry for any block contained inside aContentLibraryBlock
, instead of by the usual mechanisms. Which works, but it violates a layer of abstraction, as the storage mechanism is now using special handling for one specific XBlock type.To illustrate the issue more concretely: We currently don’t export default values for XBlocks because the user never specified their values. If you don’t set an attribute in your OLX, it shouldn’t appear there when you export. If you only set something on a sequential, it shouldn’t be echoed down into every module underneath it on export. But because this defaulting mechanism was overloaded to pull in values from the Content Library’s version of the problem (by copying it to the
defaults
for the Course’s copy of the problem), we can’t just start exporting defaults without exporting a whole bunch of garbage that isn’t there today. So we’d have to special case it again on the export as well.
These things could, and maybe should be done. But anything that changes import/export serialization is not simple and we have to be especially careful about compatibility. Also, the goal here is not to completely fix the current implementation of Content Libraries. The two goals are to fix it for the two recent bugs that have come up, and make design notes so that we get this less wrong in the next version of the feature that is under development.