Deadline for comments: March 18th, 2015
Status of the document: Proposal
Collaborators: ClintonB (Deactivated) pfoggR (Deactivated)

Overview

Data Model

Common Questions

How does multi-language support work?

All tables which store user-facing text have a related table with the suffix "Override" e.g. Course and CourseOverride.  Each Override table consists of the following fields:

Note: We are using locale because it is more fine grained than simply using language but still gives us the ability to query and display data at the language level. Since so much of our content is text based, this level of granularity is important especially for logographic languages like Chinese or Japanese.

What about versioning and point-in-time queries?

We are investigating a handful of versioning approaches which allow us to handle many features like point-in-time queries and moderation or approval workflows.  That system is still in the design phase and is mostly orthogonal to decisions made about this data model, thus we will be presenting the versioning data model separately.

How does multi-tenancy work?

This system supports a form of multi-tenancy (though by design we have opted to not support cross-tenancy).  Three top level objects, Course, Organization, and Program, have foreign keys (partner_id) which map to the Partner owner of that top level object.  The fourth top-level object, CourseRuns, uses the parent course's partner_id to infer ownership.

Does this support courses run by multiple organizations or parent orgs with many sub-organizations?

There is a M:M relationship between courses and organizations allowing for multiple organizations to be associated with a course.  This current model does not incorporate a parent-child relationship between organizations, but would be trivial to add now or in the future if/when there are sufficient use cases.

How do "high school/AP" courses fit into this model?

High school/AP would be a level_type in this model.

What are the major differences between this and the current models?

Separation of Course and CourseRun concepts

Courses and CourseRuns are unique concepts in this model.  Attributes of a course fall into two categories:

Other attributes that are expected to change on a run by run basis live on the CourseRun object.  Dates, pacing, languages, and course syllabus.

CourseRuns now have an announcement date for when the course run is announced to the public.

Effort, Subjects, Prerequisites, Expected Learnings, and Syllabi

Currently "effort" is an open text field where anything can be written.  Recommendation is to standardize around a range of estimated hours/week of work.

In addition to effort, items like subject, prerequisites, expected learnings, and syllabi are also being explicitly pulled out into their own tables to provide more structure to that data.

Subject and Prerequisites are an unordered list of those items, while expected learnings is an ordered list of freeform text fields.  

The syllabus table is a nested outline structure:

Person/Organization

Courses and CourseRuns have many people and orgs through join tables which also specify their relation to the course/run.  For example all courses will be related to the org which is running the course, but some courses also have a sponsor who is not directly running the course, but providing assistance in some way, like Databricks sponsoring this Berkley course on spark https://www.edx.org/course/introduction-spark-uc-berkeleyx-cs105x.  

 

Authorization

The authorization progression would be as follows:

V1 - API is read-only.  Anyone with a valid access token can read.
V2 - API has basic CRUD operations.  Only super users can access non-read endpoints. Note: this would necessarily coincide with a switch making this the system of record for course metadata.
V3 - Super users can grant standard users create and update permissions at an organization level (TBD on delete).  E.G. Harvard users may be given access to edit/update any course, course run, or person that exists under the Harvard organization.