Course Discovery RDB Model

Deadline for comments: March 18th, 2015
Status of the document: Proposal
Collaborators: ClintonB (Deactivated) pfoggR (Deactivated)

Overview

  • This is data model for the Course Discovery IDA outlined here: Course Discovery and Inter-IDA Messaging
  • The major top-level objects of this system are Course, CourseRun, People, and Organizations (in blue).
  • There are associated "Override" tables which are used to handle multi-language support and potentially other content override use cases, described in more detail below.
  • Most, but not all, relationships are documented in the diagram.  The relationship of any table to its Override tables, and certain enumerable tables (like locale and site) have been omitted for visibility purposes.
  • At present this IDA will source data from LMS and Drupal to populate this DB for edx.org, but eventually we would like to flip it and have all IDAs write directly to this service making it the "source of truth" for course metadata (AKA non-course-content and non-ecommerce data).

Data Model

Common Questions

How does multi-language support work?

All tables which store user-facing text have a related table with the suffix "Override" e.g. Course and CourseOverride.  Each Override table consists of the following fields:

  • a primary key id
  • a foreign key to the main table (e.g. course_id)
  • a foreign key to the locale of the override content
  • a column for each language specific attribute of the main table (e.g. title, description, etc)

Note: We are using locale because it is more fine grained than simply using language but still gives us the ability to query and display data at the language level. Since so much of our content is text based, this level of granularity is important especially for logographic languages like Chinese or Japanese.

What about versioning and point-in-time queries?

We are investigating a handful of versioning approaches which allow us to handle many features like point-in-time queries and moderation or approval workflows.  That system is still in the design phase and is mostly orthogonal to decisions made about this data model, thus we will be presenting the versioning data model separately.

How does multi-tenancy work?

This system supports a form of multi-tenancy (though by design we have opted to not support cross-tenancy).  Three top level objects, Course, Organization, and Program, have foreign keys (partner_id) which map to the Partner owner of that top level object.  The fourth top-level object, CourseRuns, uses the parent course's partner_id to infer ownership.

Does this support courses run by multiple organizations or parent orgs with many sub-organizations?

There is a M:M relationship between courses and organizations allowing for multiple organizations to be associated with a course.  This current model does not incorporate a parent-child relationship between organizations, but would be trivial to add now or in the future if/when there are sufficient use cases.

How do "high school/AP" courses fit into this model?

High school/AP would be a level_type in this model.

What are the major differences between this and the current models?

Separation of Course and CourseRun concepts

Courses and CourseRuns are unique concepts in this model.  Attributes of a course fall into two categories:

  • Attributes that we do not expect to change often between course runs, but which a course run can override if required
    • Title, descriptions, primary marketing video and images.
  • Attributes which if they must be changed to reflect the reality a particular course run indicate that that course run is such a departure from the rest that the run should be considered a run of a NEW course.
    • Level, subject, prerequisites and expected learnings.
  • Effort is unique in that is it something we expect to be consistent across runs so it would normally live on the course.  However, we must consider the CCX use case, which may need to alter the effort for a given CCX run.  In order to accommodate this case effort is a foreign key on the course_run object.  Since it is a foreign key we could have the option to keep all non-ccx runs pointed at the same effort object so updating that object would update effort for all courses while CCX runs would point at their own effort objects and be unaffected.

Other attributes that are expected to change on a run by run basis live on the CourseRun object.  Dates, pacing, languages, and course syllabus.

CourseRuns now have an announcement date for when the course run is announced to the public.

Effort, Subjects, Prerequisites, Expected Learnings, and Syllabi

Currently "effort" is an open text field where anything can be written.  Recommendation is to standardize around a range of estimated hours/week of work.

In addition to effort, items like subject, prerequisites, expected learnings, and syllabi are also being explicitly pulled out into their own tables to provide more structure to that data.

Subject and Prerequisites are an unordered list of those items, while expected learnings is an ordered list of freeform text fields.  

The syllabus table is a nested outline structure:

  • Week 1
    • Overview
    • Meet the professors
    • First assignment

Person/Organization

Courses and CourseRuns have many people and orgs through join tables which also specify their relation to the course/run.  For example all courses will be related to the org which is running the course, but some courses also have a sponsor who is not directly running the course, but providing assistance in some way, like Databricks sponsoring this Berkley course on spark https://www.edx.org/course/introduction-spark-uc-berkeleyx-cs105x.  

 

Authorization

The authorization progression would be as follows:

V1 - API is read-only.  Anyone with a valid access token can read.
V2 - API has basic CRUD operations.  Only super users can access non-read endpoints. Note: this would necessarily coincide with a switch making this the system of record for course metadata.
V3 - Super users can grant standard users create and update permissions at an organization level (TBD on delete).  E.G. Harvard users may be given access to edit/update any course, course run, or person that exists under the Harvard organization.