Course Discovery and Inter-IDA Messaging

Course Discovery Service

Background

The end-goal of the Course Discovery project is to provide the following:

  • An internal API for reading Course marketing/discovery materials
    • This will be used to present the Course Discovery experience on edx.org, in the mobile app, and, ideally, in the default OpenEdX installation.
  • A public (external to an OpenEdX installation) facing API for reading Course marketing/discovery materials
    • This API matches the internal API, but adds some filtering and permissions, tracking of referrals, and rate limiting. It is also heavily cached so that any traffic to the external API cannot affect the performance of the internal API.
  • An internal API for grouping courses based on stored search parameters (Dynamic Catalogs)

The data available from the Course Discovery API will include text (course name and description), media (intro video and instructor portraits), and pricing. The authoritative form of this data is owned by several different services right now, including (for edx.org) LMS, the Drupal marketing site, Programs, and Otto (the ecommerce service). This data ownership is sensible, but in order to provide a performant API for Course Discovery, and to isolate the internal and external APIs from each other, it will be de-normalized and cached by the Course Discovery service. To provide low latency availability for the cached data, the primary data sources will push notifications about data changes to the Course Discovery service via a new Inter-IDA Messaging system, and the Course Discovery Service will call back to those applications to fetch the full data contents.

API

Course Discovery API

End-State Architecture

Key Interactions

  • Studio, Otto, Programs, and Drupal push Course Updated messages into the exchange when a Course Run is modified
  • Internal Course Discovery Worker subscribes to Course Updated messages. On receipt, it reads the course content from the source system (over a REST API), and writes that content to Internal Course Discovery.
  • Internal Course Discovery stores the course content in ElasticSearch, unmodified.
  • EdX Web and EdX Mobile request (via a REST API) course discovery materials from Internal Course Discovery. On read, the stored data is migrated to the latest version, and then overlays are performed to merge the individual course representations stored from each of Otto, Studio, and Drupal.
  • At a regularly scheduled interval, External Course Discovery refreshes its data by reading from Internal Course Discovery.
  • Partners read from External Course Discovery (via a REST API), loading the contents of specific pre-filtered courses (defined by Dynamic Catalogs). They are rate limited, and authenticate using Asymmetric Signed JWTs that were provisioned using a management interface.
  • Otto uses Internal Course Discovery to provision and query Dynamic Catalogs of courses. These catalogs will be defined with a search language, and will be used to support Coupon Codes in Otto.

Deployment

The Course Discovery IDA (and surrounding infrastructure) will be delivered in stages.

Phase 1 (Dynamic Catalogs)

In Phase 1, the Otto will push data to Internal Course Discovery when courses are updated, and will query Internal Course Discovery to read/write coupon configuration.

Phase 2 (Public API)

In Phase 2, support for inter-IDA updates will be added, and Internal Course Discovery will start to record data from Studio and DrupalExternal Course Discovery will make data from Internal Course Discovery available to partners.

Phase 3 (Internal APIs) 

 

In Phase 3, edX Web and edX Mobile will read from Internal Course Discovery to power their course discovery UX.

 

Phase 4/Final (Programs Integration)

In Phase 4, Programs will publish updates into the message exchange, and Internal Course Discovery and External Course Discovery will integrate program membership into course data.

Inter-IDA Messaging

 

Background

We would like to minimize the following sources of pain in Course Discovery data aggregation:

  • latency between updates in the source system and Course Discovery
  • cost to add a new producer of course data

To minimize both of those, Course Discovery will use a push-based model of data updates. One downside of a push-based model is that to add new consumers of the same set of updates (such as a notification system, or a post-processing service, for example), we would need to change each of the data producers to point to each new consumer.

To alleviate that potential pain, we'll be adding a message exchange to support a pub/sub pattern between IDAs.

Message Structure

While the exact message format hasn't yet been defined, there are several characteristics they should have to help with the update-notification use-case. In particular, they should be small immutable messages. That is, the data included in the message should not be mutable, and should instead just include identifiers and URLs needed to retrieve the current mutable content. This mechanism means that we don't have to ensure ordering in messages, or worry about missing messages, because content retrieved will always be the most current.

Other Types of Messaging

Notably, this messaging service is not intended (at least not yet) to cover all possible types of inter-IDA messages. In particular, several other types of eventing have different properties:

  • Learning Analytics: The analytics systems want to be able to replay events. This is not a requirement of the current system, and would introduce additional requirements on the messaging infrastructure.
  • Financial Events: These events need delivery guarantees or transactional semantics, to ensure no money is left unaccounted. As such, these should likely be stored in a RDBMS, and then imported via an ETL process for reporting (and any cross-service requests should be made synchronously).
  • User Notifications: These events are the closest to the proposed system, but differ in scale, as the number of messages sent would be proportional to students, rather than authors.