...
Since that effort, the performance team has implemented an optimization for non-locked course assets. See Dave Ormsbee or Toby Lawrence (Deactivated) for further details.
Ways to Query the Content Store
- using PyMongo:
- If it was a PyMongo script, you'd run it from any machine that could connect to the prod mongo replica.
- using mongo itself:
- If it was a JS script, you'd need to be on a machine that could connect via the "mongo" command prompt.
- "mongo < ./myscript.js"
- using the read-replica mongo shell
- ssh to the tools-gp.edx.org machine
- navigate to
/edx/bin
and run the appropriate script to enter a mongo shell for the databse.- e.g. ./prod-edx-edxapp-mongo.sh
- run your query
- NOTE: The prod-edx/prod-edge read-replica secondary databases are independent of the primary databases, so you can execute intensive queries via these shells. However, the stage and loadtest read-replicas are in the same cluster as the primary database, so intensive queries may affect the stage or loadtest environments.
Counting Assets in Each Course
...
Code Block |
---|
language | js |
---|
theme | FadeToGrey |
---|
|
/* The original "group" */
db.fs.files.group( {
key: {"_id.course": 1, "_id.org": 1, "_id.run": 1},
reduce: function(cur, result) { result.count += 1 },
initial: {count: 0}
} )
/* ..and with category included. */
db.fs.files.group( {
key: {"_id.course": 1, "_id.org": 1, "_id.run": 1, "_id.category": 1},
reduce: function(cur, result) { result.count += 1 },
initial: {count: 0}
} )
var mapFunction = function() {
var slicer = function(x) { return x.slice(0, x.lastIndexOf("+")) };
var split_id = null;
if (typeof this._id === "string")
split_id = slicer(this._id);
var key = [ this._id.course, this._id.org, this._id.run, this._id.category, split_id ];
emit( key, 1 );
};
var reduceFunction = function(key, values) {
return Array.sum(values);
}
db.fs.files.mapReduce(
mapFunction,
reduceFunction,
{
out: {inline: 1}
}
)
/* For debugging the mapper... */
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
/* To find all courses which aren't the three specified. */
db.fs.files.find({"_id.course": {$nin : ["DemoX", "import_test", "LargeCourse101"]}}) |
Finding Distinct Values for Fields and Frequency of the Values
Code Block |
---|
|
db.fs.files.aggregate(
{$group : { _id : '$<field_name>', count : {$sum : 1}}}
).result |