Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

The "contentstore" is where all course assets are stored. It's really a wrapper of code around a GridFS (MongoDB) backend and it stores binary files which can PDFs, WAVs, JPGs, or other. The contentstore code is mainly here:

The contentstore has some known technical problems, explained somewhat in this presentation:

An effort to move all course assets out of GridFS and into external storage was begun in 2014 and abandoned. The docs from that effort: GridFS Replacement

Since that effort, the performance team has implemented an optimization for non-locked course assets. See Dave Ormsbee or Toby Lawrence (Deactivated) for further details.

Ways to Query the Content Store

  1. using PyMongo:
    • If it was a PyMongo script, you'd run it from any machine that could connect to the prod mongo replica.
  2. using mongo itself:
    • If it was a JS script, you'd need to be on a machine that could connect via the "mongo" command prompt.
    • "mongo < ./myscript.js"
  3. using the read-replica mongo shell
    1. ssh to the machine
    2. navigate to /edx/bin and run the appropriate script to enter a mongo shell for the databse.
      1. e.g. ./
    3. run your query
    4. NOTE: The prod-edx/prod-edge read-replica secondary databases are independent of the primary databases, so you can execute intensive queries via these shells. However, the stage and loadtest read-replicas are in the same cluster as the primary database, so intensive queries may affect the stage or loadtest environments. 

Counting Assets in Each Course


Code Block
/* The original "group" */ {
    key: {"_id.course": 1, "": 1, "": 1},
    reduce: function(cur, result) { result.count += 1 },
    initial: {count: 0}
} )

/* ..and with category included. */ {
    key: {"_id.course": 1, "": 1, "": 1, "_id.category": 1},
    reduce: function(cur, result) { result.count += 1 },
    initial: {count: 0}
} )

var mapFunction = function() {
    var slicer = function(x) { return x.slice(0, x.lastIndexOf("+")) };
    var split_id = null;
    if (typeof this._id === "string")
        split_id = slicer(this._id);
    var key = [ this._id.course,,, this._id.category, split_id ];
    emit( key, 1 );
var reduceFunction = function(key, values) {
    return Array.sum(values);
        out: {inline: 1}

/* For debugging the mapper... */
var emit = function(key, value) {
    print("key: " + key + " value: " + tojson(value));

/* To find all courses which aren't the three specified. */
db.fs.files.find({"_id.course": {$nin : ["DemoX", "import_test", "LargeCourse101"]}})

Finding Distinct Values for Fields and Frequency of the Values

Code Block
   {$group : { _id : '$<field_name>', count : {$sum : 1}}}