Analytics Data API - Paginated Course Summaries


Abstract

We want to move filtering, sorting, pagination, and aggregation of course summaries from client-side Insights to within the Analytics Data API.

Background

There exists as "Course Summaries" endpoint (undocumented) in the Analytics Data API at:

    GET /api/v0/course_summaries/?course_ids=<course_id_0>...<course_id_n>

and:

    POST /api/v0/course_summaries/
    {
        'course_ids': [ ... ],
        ...,
    }

Both methods allow the client to get metadata about the enrollment count, enrollment delta, and start/end dates of the courses whose IDs are passed in. The POST method is identical to the GET method other than that its arguments are passed in the request body, allowing the number of course IDs to not be limited by URL length restrictions.

This endpoint is called by Insights on the server-side, and its data is sent in a template to the client. Because the API has no notion of user course access, Insights must handle ensure that only the course summaries to which the user has access are sent to the client. It does so in one of two ways:

  • If the user has access to fewer than 500 courses, course_ids is set to the IDs of those courses. So, the returned course summaries are only the ones that the user has access to.
  • Otherwise, course_ids is not passed in, resulting in all course summaries being returned. Insights must then filter out the course summaries that the user doesn't have access to.

Note: The reason for the two separate methods of filtering is that after a certain number of course IDs, it actually becomes slower to process all the IDs and load the correct course summaries than it does to simply load every course summary.

Once the data reaches the client, it is filtered, sorted, and paginated using Backgrid. Although the initial page load is slow, the loaded table is very responsive and snappy. Additionally, the data is aggregated to show some overall statistics at the top of the page.

Finally, there is a "Download CSV" link that writes all the course summary data (unfiltered, sorted by course display name) into a CSV format for the user to download. It does this all client-side.

Problems with the current design

  • Allowing/requiring the API to return EVERY course summary is slow, taxing on the API server, and not scalable
    • Causes >5s load time on Insights course listing page (that page users see right after logging in)
    • Dave, paraphrased: "Sure, it works now... but what if we get 10,000 courses? Or 50,000? Allowing an endpoint to return that much data is not good"
  • Having two separate methods of course-access filtering is confusing and requires code duplicated between Insights and the API

Solution

Paginate the API response. This necessarily means filtering, sorting, and aggregation will also be done in the API. We will also have to make a new Insights API endpoint that mirrors and uses as a backend the Analytics Data API course summaries endpoint in order to make the data available to the Insights client-side course listings page.

TODO: Add in `fields` and `exclude` parameters to course summaries endpoints

Insights: Course Index View

http(s)://<insights_host>/courses/#?<query_string>
MethodDescriptionQuery ParametersStatuses
GET
  • Get course listing page with aggregate data
    • Aggregate data is NOT affected by filters/pagination
  • Course summary data NOT loaded yet
  • However, query parameters are accepted and used for subsequent course summary AJAX call
  • Changing the query string will not reload the page, but will trigger an AJAX call that will update the table
  • Sorting/filtering/paging the table will trigger an AJAX call and update the query string
  • sortKey (optional): One of the following. Default: catalog_course_title
    • catalog_course_title
    • start_date
    • end_date
    • cumulative_count
    • count
    • count_change_7_days
    • verified_enrollment
    • passing_users
  • order (optional): One of the following. Default: asc
    • asc
    • desc
  • availability (optional): Comma-separated list of one or more of the following. Default: all availabilities
    • Archived
    • Current
    • Upcoming
    • Unknown
  • program_ids (optional): Comma-separated list of course IDs to filter by. Default: all programs
  • text_search (optional): (Sub)string to filter by for course titles and IDs. Default: do not filter by search string
  • page (optional): Page number. Default: 1

For defaults, put enumeration-style key-value pairs in the URL

  • 401 if not authenticated
  • 200 otherwise

TODO: Look into what happens for bad query params. Stick with current functionality. Want to keep same URL scheme and page behavior to avoid breaking URLs

Insights: Course Summaries API

http(s)://<insights_host>/api/course_summaries/v1/course_summaries/?<query_string>
MethodDescriptionQuery ParametersReturn ValuesStatuses
GET

Get paginated list of course summaries

  • order_by (optional): See description for CourseIndex View 'sortKey'
  • sort_order (optional): See description for Course Index View 'order'
  • ... all other params from Course Index View ...
  • fields (optional): Fields to include in response. A comma-separated list of one or more of the fields listed under 'results' in 'Return Values'. Mutually exclusive with 'exclude'. Default: All fields included
  • exclude (optional): Fields NOT to include in response. A comma-separated list of one or more of the fields listed under 'results' in 'Return Values'. Mutually exclusive with 'fields'. Default: No fields excluded
  • page_size (optional): Page size. Max: 100, Default: 100
  • results: array of page of result dicts with fields:
    • count
    • end_date
    • created
    • cumulative_count
    • programs
    • enrollment_modes
    • availability
    • verified_enrollment
    • pacing_type
    • passing_users
    • count_change_7_days
    • course_id
    • catalog_course_title
    • catalog_course: course ID without run
    • start_date
  • count: total number of results (all pages)
  • next: link to next page
  • previous: link to previous page
  • last_updated: String containing date of time result summaries were updated
  • 401 if not authenticated
  • 404 if no results
    • TODO: what does learner analytics API do?
  • 400 if bad parameter value
  • 200 otherwise

Analytics Data API: Course Aggregate Data

http(s)://<data_api_host>/api/v1/course_aggregate_data/?<query_string>
MethodDescriptionQuery ParametersReturn ValueAccess
GETGet aggregate data about a set of courses
  • course_ids (optional): Comma-separated list of course IDs to filter by. Default: All courses
  • count
  • cumulative_count
  • count_change_7_days
  • verified_enrollment
Same as Insights Course Summaries API
POSTSame as GET, but number of course IDs is not restricted by URL lengthSame as above, but comma-separated lists are JSON arrays of stringsSame as aboveSame as above

Analytics Data API: Course Summaries

http(s)://<data_api_host>/api/v1/course_summaries/?<query_string>
MethodDescriptionQuery ParametersReturn ValueAccess
GETGet a paginated list of course summaries, with optional filtering and sorting
  • course_ids (optional): Comma-separated list of course IDs to filter by. Default: All courses
  • ... all params from Insights Course Summaries API ...

Same as Insights Course Summaries API, but

  • last_updated is not included
  • next/previous have API hostname in URL

    TODO: calculate last_updated in the Data API and pass in from this endpoint
Same as Insights Course Summaries API
POSTSame as GET, but number of course IDs is not restricted by URL lengthSame as above, but comma-separated lists are JSON arrays of strings

Same as above, except next/previous URLs are not included

Same as above