Master's Enrollment Integration Architectural Review

Context

In order to facilitate Master's-level learning on edX, we will need to support several capabilities:

  1. Explicit enrollment of learners in Master's degree programs.
  2. Preemptive enrollment of students in Master's degree programs and course runs before they have created their edX accounts, using their student ID or email address as an identity reference.
  3. A consistent and robust API for universities, facilitating:
    1. Listing of programs and their constituent courses
    2. Listing of program and course enrollments
    3. Creation/modification of program and course enrollments
    4. Viewing of final grades for students in a program
    5. Freezing of final grades for students in a program
    6. Viewing of course attendance for students in a program
  4. Single-sign-on for Master's students using the university credentials
  5. Eventually: A way for users to view their Master's program/course enrollments through an institution-brandable GUI
  6. Eventually: An way for program administrators to perform the actions in #3 through a user-friendly GUI

Design A: Authoritative Registrar



The Neem Team's original approach to these issues was the creation of a Registrar service. This service would simultaneously fulfill primary purposes:

  • Serve as a source of truth (i.e. authoritative data store) for enrollments in Master's degree programs. Currently, there is no explicit notion of "program enrollment" in the edX ecosystem. Because program enrollment is an important concept in the context of Master's degrees, it seemed sensible to create and store this association within the Registrar service
  • Serve as an integration point between SISs (Student Information Systems) and the edX ecosystem by creating an API facilitating the transactions described in capability #3. Furthermore, this API could be used to create Micro-frontends to fulfill capabilities #5 and #6.

This design, while technically viable, raised architectural concerns. Specifically, Nimisha Asthagiri (Deactivated) and Dave Ormsbee (Deactivated) pointed out that Registrar would simultaneously be (1) authoritatively storing program enrollment data, which is likely to become relevant to future edX services and frontends, and (2) aggregating and translating APIs for consumption by external systems. They believed that (1) should be performed by a core edX service such as the LMS or a theoretical "Enrollments Service", whereas (2) should be performed by a periphery service that is not depended upon by any other edX services. Furthermore, they believed that if the Registrar is a periphery integration service, the Master's Student Dashboard should pull data from core edX services instead.

Design B: Non-Authoritative Registrar

One immediate resolution to the architectural concerns in Design A would be to continue having Registrar as an integration point, but moving the source of truth for program enrollments to the LMS, and having the Master's Student Dashboard use the LMS and Discovery Service as its backends. However, this has the effect of adding functionality and data to the LMS, which is already considered a bloated service.

Design C: Enrollments Service

Nimisha and Dave expressed that they would be open to having a separated "Enrollments Service" to authoritatively store program enrollment data (and, in the future, course enrollment data as well), on the condition that the service was not also designed to be an integration point for general "registrar" operations. That is, SISs would need to query other edX services for non-enrollment data; specifically, they would need to fetch program structure from Course Discovery and grades/attendance from the LMS.

In addition to the added complexity of the URL scheme exposed to SISs in this design, authorization becomes an issue. That is, in the previous two solutions, Registrar would be able to enforce permissions regarding read/write access to program metadata, enrollments, grades, etc. based on the API user and the organization they are associated with. However, in this solution, we would need to handle API authorization across three separate edX services, which would require either (1) devising a scheme to synchronize user permissions across services or (2) manually keeping each API user's permissions up to date on all three services.

Design D: Enrollments Service + Integration Point

A third alternative involves retaining the Enrollments Service, while additionally having a service or gateway between SISs and the edX services that host the relevant data. There are two variations of this design:

  1. Build Registrar exclusively as an integration point, synthesizing data that is authoritatively stored in LMS/Discovery/Enrollments. This would solve the URL complexity issue from Design C. Additionally, it is possible that permissions could be assigned in Registrar, which would address the authorization issue from Design C.
  2. Use the AWS API Gateway to expose the necessary LMS/Discovery/Enrollment APIs under a consistent domain (such as api.edx.org). This solves the URL complexity issue, but does not address the authorization issue.

Preemptive Enrollments

One detail that has been omitted from the four designs above for sake of simplicity is the concept of preemptive enrollments (sometimes referred to in discussions as pending enrollments, waiting enrollments, enrollment requests, future enrollments, etc.). Schools will need to be able to send edX a list of learners (identified by either .edu email address or student ID) to be enrolled in programs and/or courses. In the event that some of the learners do not already have edX accounts, we must store the preemptive enrollments. When a new edX account is created, any preemptive enrollments corresponding to it should be turned into true enrollments. For each design, preemptive enrollments would be instrumented in a different way:

  • Design A: Preemptive enrollments, both for courses and programs, would be stored within Registrar. When an LMS user is created, preemptive program enrollments would become program enrollments within Registrar, and preemptive course enrollments would be created as course enrollments in LMS using the existing Enrollment API.
  • Design B: Preemptive course/program enrollments could either be stored in LMS or Registrar. In the case of the former, true course/program enrollments would be created from preemptive enrollments by mechanism of the User model's post_save signal. In the case of the latter, Registrar would create true course/program enrollments in the LMS using the Enrollment API upon creation of the LMS user.
  • Design C: Same as Design A, except the Enrollments Service takes the place of Registrar.
  • Design D: Same as Design C.

For all designs in which a non-LMS service (specifically, Registrar or Enrollments) owns the preemptive enrollments, an issue arises: how do we know when an LMS user has been created? It is necessary to know when an LMS user as been created in order to turn their preemptive enrollments into true enrollments. Three options present themselves:

  1. Celery-based communication between LMS and Registrar/Enrollments. In terms of the long-term edX architectural strategy, this seems to be the desired approach, but it is not clear currently what implementing such a system would entail and whether it would be feasible within our time constraints.
  2. LMS making push API calls to Registrar/Enrollments upon the creation of a user. It would be undesirable for LMS to have to make API calls against a "periphery" service like Registrar, but it may be acceptable for the LMS to have to make API calls against a "core" service like Enrollments.
  3. Registrar/Enrollments regularly polling the LMS to check whether accounts have been made for learners with preemptive enrollments.

Finally, in all possible designs, there exists the issue of correctly linking preemptive enrollments to newly created accounts. That is, preemptive enrollments must be tagged with some piece of identifying information–say, a student email or student ID–so that when an LMS account is created, we can determine which (if any) preemptive enrollments belong to them. Currently, the LMS Enterprise application supports preemptive enrollments for enterprise users, referring to them as "pending enrollments". It links newly created LMS users to their pending enrollments using the email address associated with the account (see https://github.com/edx/edx-enterprise/blob/master/enterprise/signals.py#L19). However, it is not clear whether this is a viable option for preemptive enrollments for Master's programs/courses.

Decision

After consulting with members Architecture and Master's-Neem, Design B was decided upon. The driving factors in the decision were:

  • Storing the program enrollments authoritatively outside of the LMS would be additional work for the team
  • If Designs C or D were taken, it is not clear whether any team would ever find time to undertake the difficult task of moving course enrollments out of the LMS and into the Enrollments Service. It could result in enrollments being stored in two different services for the foreseeable future, which is undesirable.

Furthermore, we decided to also store preemptive course and program enrollments in the LMS. This removes the issue of having to communicate LMS user creation to other services. When an LMS user is created, the User model's post_save signal will be captured elsewhere in the LMS, and any preemptive enrollments corresponding to the user will be created. We will create our new models for PreemptiveProgramEnrollment and PreemptiveCourseEnrollment

Finally, we will likely require that the student identifier provided by school when creating preemptive enrollments be the same as the remote ID provided  by the school's SAML IdP. More specifically: in LMS, the uid field of a UserSocialAuth record for a SAML sign-in takes the form "<school_slug>:<remote_id>". The remote_id should equal the student_id used to create preemptive enrollments. That way, we will be able to link together preemptive enrollments and LMS users.