Proposal: API collection format.

We need a better convention for returning API results from collection (LIstView-style) endpoints.

Table of Contents

Current behavior

Currently, when we return a collection of resources in an API, we use the default formatting provided by DRF, which returns a direct JSON list of results.  For example:

  [
{"id": 1, "item": "First item", "created": "2015-11-19T09:28:44Z"},
{"id": 2, "item": "Second item", "created": "2015-11-19T09:29:03Z"},
{"id": 3, "item": "Third item", "created": "2015-11-19T09:31:30Z"},
{"id": 4, "item": "Fourth item", "created": "2015-11-19T09:35:12Z"}
]

If we then paginate the results, using DRF's default pagination the above gets converted a JSON object that contains "previous" and "next" elements, which contain links to the previous and next pages respectively, a "count" element that contains the total number of results on all pages in the query, and a "results" element that contains the current page's results.  If the above example were paginated with three elements per page, the response would look like:

  {
"previous": null,
"next": "/path/to/results?page=2",
"count": 4,
"results": [
{"id": 1, "item": "First item", "created": "2015-11-19T09:28:44Z"},
{"id": 2, "item": "Second item", "created": "2015-11-19T09:29:03Z"},
{"id": 3, "item": "Third item", "created": "2015-11-19T09:31:30Z"}
]
}

Issues with current behavior

This is unsatisfactory for a few reasons:

  1. The results are in a different location depending on whether the query is paginated or not, which makes it more difficult to create a client to consume our data.
  2. The top level object controls pagination, which foregrounds meta-information while backgrounding the actual results the user was looking for. 
  3. It also doesn't play nicely with other metadata we might want to add, as we would either have to create new subsections for other kinds of metadata, which are then at a different level than the pagination, or dump it all into the top-level, which creates a mess of un-namespaced metadata.

Proposed behavior

I propose that we update our API conventions for collection endpoints to:

  1. Always return a JSON object, with an element "results", that contains a list of result items, whether or not the result set is paginated.

  2. If a result set is paginated, encapsulate pagination information under a top-level "pagination" entry alongside results.

This puts the actual results in a consistent location within the response, and provides a convenient namespace for the pagination metadata.  Any given result can be tested for pagination very easily: If the response object contains a "pagination" element, the results are paginated.

Examples

Unpaginated:

  {
"results": [
{"id": 1, "item": "First item", "created": "2015-11-19T09:28:44Z"},
{"id": 2, "item": "Second item", "created": "2015-11-19T09:29:03Z"},
{"id": 3, "item": "Third item", "created": "2015-11-19T09:31:30Z"}
]
}

Paginated:

  {
"pagination": {
"count": 4,
"previous": null,
"next": "/path/to/results?page=2"
},
"results": [
{"id": 1, "item": "First item", "created": "2015-11-19T09:28:44Z"},
{"id": 2, "item": "Second item", "created": "2015-11-19T09:29:03Z"},
{"id": 3, "item": "Third item", "created": "2015-11-19T09:31:30Z"}
]
}

New Code

Non-paginated responses could be adapted to the new scheme in one of two ways:

  1. Create a new Serializer base class that reformats the .data attribute to have the structure defined above.
  2. Create a custom renderer to inspect the data for a flat list of results, and wrap it in an object with a "results" element. 

Paginated responses can be handled writing a custom paginator that overrides the get_paginated_response() method to return the desired format.  It should handle serializers that return a flat list of results as well as objects with a "results" element.

Migration steps

Soon:

  1. Initially, we update unpublished APIs (discussion api, course catalog api) to explicitly call the new paginator/renderer classes.

  2. Existing published APIs explicitly define their paginator/renderer as the built-in DRF classes they currently use.

  3. Defaults named in settings get changed to the new classes.
  4. Paginated APIs could be converted to an intermediate format that includes the new "pagination" section, but has the "count", "previous" and "next" elements at the top level for backward compatibility.

Eventually:

  1. When unpaginated APIs get a version bump, we would move response content into a "results" section.
  2. At some point, we may decide to version-bump unpaginated APIs to use the new format anyway.
  3. If we decide to add pagination to an existing API, it would need a version bump anyway
  4. When paginated APIs get a version bump due to other changes we would remove the top-level pagination elements in favor of the new "pagination" section.
  5. There would not be a need to force-version bump already-paginated APIs, as the intermediate format would already expose the new format.

Thoughts

Non-intrusive work

As this is all handled by Paginators, and Renderers, which are defined as class attributes on DRF generic views, and superclasses of our Serializers, we would very likely be able to support multiple versions of the API (where the format is the only change) without much duplicate code by creating views that differ only by a superclass and/or an attribute or two.  Pseudocode example:

  class View(ListAPIView):
paginator = PageNumberPaginator
# renderer = JSONRenderer
def list(self, request):
return Serializer([1, 2, 3])

class V2View(View):
# Our new paginator and renderer
paginator = NestedPageNumberPaginator
renderer = JSONResultsRenderer

urls = urlpatterns('',
url(r'/api/sample_api/v1/objects/', View.as_view()),
url(r'/api/sample_api/v2/objects/', V2View.as_view()),
)

We could remove duplication by creating a NewPaginationMixin for V2Views to inherit from:

  class View(ListAPIView):
paginator = PageNumberPaginator
def list(self, request):
return Serializer([1, 2, 3])

class NewPaginationMixin(object):
paginator = NestedPageNumberPaginator
renderer = JSONResultsRenderer

class V2View(NewPaginationMixin, View): pass

Alternative implementations

JSON API

If we want to leverage existing standards, we could alternatively structure our responses to conform more to something like JSON API http://jsonapi.org/.  In this format, our response would look like:

  {
"data": [
{
"type": "result":
"id": 1,
"attributes": {"item": "First item", "created": "2015-11-19T09:28:44Z"}
},
{
"type": "result",
"id": 2,
"attributes": {"item": "Second item", "created": "2015-11-19T09:29:03Z"}
},
{
"type": "result",
"id": 3,
"attributes": {"item": "Third item", "created": "2015-11-19T09:31:30Z"}
}
],
"links": {
 "previous": null,
"next": "/path/to/results?page=2"
},
"meta": {
"count": 4
}
}

Advantages of this format include:

  1. Links section is extensible.  Object names correspond to standard "rel" identifiers, (or custom ones we create and define in our documentation).  We can include
  2. We build on an existing standard.  As an open source project, we like standards! It's just a vendor standard, but it is designed for public use, and is properly registered as (application/vnd.api+json).  We could serve it out as either application/json or application/vnd.api+json, depending on what the client wants (and what we want to support).  Using common formats helps facilitate common client code.
  3. The format can handle more complex structures.  Related data can be included in a standard way to reduce duplication within an API call, and to reduce the number of API calls for certain queries that need heterogenous data.

Disadvantages of this format include:

  1. pagination is not neatly namespaced.  It would get mixed in with other unrelated categories of links.
  2. count information is separated from other pagination information, and is not a standard field (the meta section is specifically for non-standard data).
  3. The results/data section is formatted differently than what we currently do.  Migrating clients from our current code to this would require deeper changes than just content = content['results'].
  4. The format is more complex, and contains functionality we may not need or want.  Even if we do, taking full advantage of it may require more extensive modifications to the way we use DRF.
  5. The standard for JSON API defines not only a response format, but a publication workflow.  That opens up further questions of whether or not we want to use the standard's workflow, and if so how well that corresponds to what we do presently.