Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Since that effort, the performance team has implemented an optimization for non-locked course assets. See Dave Ormsbee or Toby Lawrence (Deactivated) for further details.

Ways to Query the Content Store

  1. using PyMongo:
    • If it was a PyMongo script, you'd run it from any machine that could connect to the prod mongo replica.
  2. using mongo itself:
    • If it was a JS script, you'd need to be on a machine that could connect via the "mongo" command prompt.
    • "mongo < ./myscript.js"
  3. using the read-replica mongo shell
    1. ssh to the tools-gp.edx.org machine
    2. navigate to /edx/bin and run the appropriate script to enter a mongo shell for the databse.
      1. e.g. ./prod-edx-edxapp-mongo.sh
    3. run your query
    4. NOTE: The prod-edx/prod-edge read-replica secondary databases are independent of the primary databases, so you can execute intensive queries via these shells. However, the stage and loadtest read-replicas are in the same cluster as the primary database, so intensive queries may affect the stage or loadtest environments. 

Counting Assets in Each Course

...

Code Block
languagejs
themeFadeToGrey
/* The original "group" */
db.fs.files.group( {
    key: {"_id.course": 1, "_id.org": 1, "_id.run": 1},
    reduce: function(cur, result) { result.count += 1 },
    initial: {count: 0}
} )

/* ..and with category included. */
db.fs.files.group( {
    key: {"_id.course": 1, "_id.org": 1, "_id.run": 1, "_id.category": 1},
    reduce: function(cur, result) { result.count += 1 },
    initial: {count: 0}
} )

var mapFunction = function() {
    var slicer = function(x) { return x.slice(0, x.lastIndexOf("+")) };
    var split_id = null;
    if (typeof this._id === "string")
        split_id = slicer(this._id);
    var key = [ this._id.course, this._id.org, this._id.run, this._id.category, split_id ];
    emit( key, 1 );
};
var reduceFunction = function(key, values) {
    return Array.sum(values);
}
db.fs.files.mapReduce(
    mapFunction,
    reduceFunction,
    {
        out: {inline: 1}
    }
)

/* For debugging the mapper... */
var emit = function(key, value) {
    print("emit");
    print("key: " + key + " value: " + tojson(value));
}

/* To find all courses which aren't the three specified. */
db.fs.files.find({"_id.course": {$nin : ["DemoX", "import_test", "LargeCourse101"]}})


Finding Distinct Values for Fields and Frequency of the Values

Code Block
languagejs
db.fs.files.aggregate(
   {$group : { _id : '$<field_name>', count : {$sum : 1}}}
).result