Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Alternative SystemInterfaceStorageUse Cases
Course Overviews

CourseOverview Django Model

Use the class methods:

  • get_from_id
  • get_from_ids_if_exists
MySQL

Originally created as a read-through cache for commonly used metadata, the CourseOverview model is now the canonical source for that information from the point of view of the LMS. Much of this data is updated by a celery task that queries the Modulestore after every publish. Some data is synced over from the Course Discovery/Catalog API on a nightly basis.

These fields are mostly related to displaying information about the course, its schedule, certificates, and marketing data. Access to this data for a single course or small set of courses is very cheap relative to the other options, so it should be the first thing you check.

If you're creating a new table that needs a foreign key to a course_id, this is the table to you connect to.

Do not do full table scans of Course Overviews to look for or collect certain values. This table has an entry for every course ever published, including CCX courses, and can be in the 10K row range.

Course Blocks API

REST API

lms.djangoapps.course_api.blocks.api

S3

Originally intended for the mobile use case, this is the API you want to query for content fields within one course. It applies user access controls, and will properly show the content for a given cohort or release dates that apply to that user. Some example use cases:

  1. I want the "display_name" field for all content of types "video" and "problem".
  2. I want to find all sequences in the course.
  3. I want to to display links to every gradable problem in the course.

If you have an XBlock or XModule, it's also possible to get custom pre-computed data into the student_view_data attribute, so long as that data can be computed at publish time (i.e. it can't have student state in it).

The Course Outline view in the LMS uses the Python interface of this API.

Course Block Transformers

BlockStructureTransformer,

see VisibilityTransformer  for a short example

S3

This is the lower level infrastructure that the Course Blocks API is built on. Use this when you want to collect and manipulate authored content data across an entire course. The idea is that you create a Transformer class that is invoked for two phases: an asynchronous "collect" phase triggered during course publish, and a synchronous "transform" phase. You do expensive data access and calculations during "collect", and then do fast, per-user manipulations of the course DAG and fields during "transform".

See this page documenting current and future Transformers.

A long term goal is for XBlock/XModule field data for the LMS to be backed by this system, so that all inheritance computations can be done during the collect phase, and it would be possible to load the data for a given problem or video without having to load the entire set of ancestor nodes.

Course Catalog API (a.k.a. Course Discovery)https://prod-edx-discovery.edx.orgMySQL, Elasticsearch

This is the authoritative place for data relating to finding and enrolling in a course. It's what's queried when you go to the marketing site and search for a course or look at its "Course About" page details – who is teaching it, what the schedule is, what language is it given in, etc. Because it's backed by ElasticSearch, it is much more efficient and flexible when searching across the system for courses with particular attributes.

Making synchronous calls to this service can be expensive, so if the data exists in Course Overviews and you can live with the delay, it's better to use that.

CourseGraph

https://coursegraph.edx.org/browser/

(Neo4j DB, not part of edx-platform, requires VPN access)

Neo4j

This is used by support/sustaining teams and occasionally services staff to answer questions about course team authored content across edx.org. Think of this as the Modulestore content field data shoved into a Neo4j database on a periodic batch process.  There are many example queries available, including:

  1. How many courses are still using a particular XModule? Can we deprecate it?
  2. Are there any proctored exams coming up soon?
  3. What courses have a particular setting enabled?

CourseGraph only sees what Modulestore sees, and some of that data is not canonical. The following are data for which the canonical answer is in the Course Discovery/Catalog API:

  • Course start/end/enrollment dates.
  • Course language.
Local App Models

Django ORM, django-storages

MySQL, S3

A common pattern is to listen for SignalHandler.course_published, and then to access Modulestore data in an asynchronous process. The Blocks API and Transformers are extremely useful for grabbing a big chunk of the Course and manipulating it, but they still incur a relatively high overhead to query 100s of milliseconds. If you just want to derive one or two small bits of field data that are read all the time, you can copy those into your own model where the access time will be ~1ms.

Another scenario you might do the "listen, extract, and store locally" pattern is for something like search indexing. We used to have this with course content using edx-search, though I'm not sure if it's actually working these days.

...

Code Block
languagepy
from xmodule.modulestore.django import modulestore
from opaque_keys.edx.keys import CourseKey, UsageKey
from opaque_keys import InvalidKeyError

# Old style IDs -- note the lack of Course Run info in usage_id
# course_id = "edX/DemoX.1/2014"
# usage_id = "i4x://edX/DemoX.1/problem/466f474fa4d045a8b7bde1b911e095ca"

# New style IDs -- usage_id has full Course Run info
course_id = "course-v1:edX+DemoX+Demo_Course"
usage_id = "block-v1:edX+DemoX+Demo_Course+type@problem+block@d2e35c1d294b4ba0b3b1048615605d2a"

# Parse the Course ID.
try:
    # This will return a SlashSeparatedCourseKey (old) or CourseLocator (new)
    # Always use CourseKey.from_string() when parsing Course IDs.
    course_key = CourseKey.from_string(course_id)
except InvalidKeyError:
    # Do some error handling here -- this is just completely made up
    raise ValueError("Could not parse course_id {}".format(course_id))

# Parse the Usage ID
try:
    # This will return a Location (old) or BlockUsageLocator (new)
    # Always use UsageKey.from_string() when parsing Usage IDs.
    unmapped_usage_key = UsageKey.from_string(usage_id)

    # The map_into_course() call is not necessary for BlockUsageLocators, but
    # we do it to maintain compatibility with old style usage keys.
    usage_key = unmapped_usage_key.map_into_course(course_key)
except InvalidKeyError:
    # Do some error handling here -- this is just completely made up
    raise ValueError("Could not parse usage_id {}".format(usage_id))

# This initializes a process global -- future calls to modulestore() will
# just return references to the same global.
ms = modulestore()

# Get a single CapaDescriptor (a Capa problem, like multiple choice).
# This object has all its XModule content fields, but not the user ones.
problem = ms.get_item(usage_key)

# Query the Modulestore for all sequentials in the Course.
sequences = ms.get_items(course_key, qualifiers={'category': 'sequential'})

# Get the root CourseDescriptor
course = ms.get_course(course_key)

# List of child usage keys (Locations/BlockUsageLocators) for the chapters.
course.children

# Iterate through the descriptors for those children instead:
for chapter in course.get_children():
    print chapter.location, chapter.display_name

...