API Authorization Notes

Our current authorization mechanisms are somewhat ad hoc. We use OAuth2 scopes for Insights, but in a way that is probably too resource specific and verbose. Most courseware access questions go through edx-platform's access.py checks. Other apps generally identify the caller and make their own decisions about what they should and shouldn't be allowed to do.

Read/Watch these first:

http://nordicapis.com/api-security-oauth-openid-connect-depth/

OAuth 2.0

We've chosen to use OAuth 2.0 as the basis of our API authorization scheme, as it is both simpler and more widely adopted by the Python community than SAML-based solutions. However, OAuth 2.0 is interesting in just how much it leaves unspecified, particularly in the information passed between the Authorization Server and the Resource Server.

For those not familiar with OAuth 2.0 at all, the main actors are the Resource Owner (the user), Resource Server (the service holding the data you want), the Client (the thing that's making an API call on behalf of a RO/user), and the Authorization Server (the thing issuing the access token). The nice thing from the Client point of view is that once it has the access token, it just knows to send it in a header on all requests to the Resource Server.

To put in more concrete terms, let's say that we exposed the Analytics API publicly. Someone on a course team can now go to a page on our site where they can generate an access token. Later on, they're going to write a script that fetches data from our Analytics API using that token. The RO is the user, the RS is the Analytics Service, the Client is their script, and the AS is the part of edx-platform that issued them the token.

If you want a good presentation on OAuth 2.0, I recommend the SpringDeveloper video "Securing RESTful Web Services with OAuth2". It's long and dry, but it goes into far more detail on the precise server interactions and deployment options than other videos I've seen. Many presentations go for a 20 minute overview of how great OAuth 2.0 is as a client, without touching on any of the out-of-spec topics you need to actually build a useful system when you're a provider.

Token Options

Access tokens are one of the first areas where OAuth 2.0 expects you to fill in the blanks. This token is opaque to the Client, but there are a couple of options for what the RS does when it sees them come across in a request header.

Call the Authorization Server on a Backchannel

The Analytics API could make a call to the Authorization Server with the token, asking for more information about it. The AS holds the mapping that this token belongs to this user and is authorized for a certain number of scopes (more on that later). It can send that information, as well as anything else that it feels would be useful (the username, groups, org, etc.) Armed with this information, the Analytics API can fulfill the request.

Github takes this approach, with the AS returning additional information like a user's avatar_url and gravatar_id.

Extract Information from the Token

The Client is required to treat the token as opaque, but the RS and AS can agree to whatever conventions they want. Another common approach is to make the access token a JSON Web Token (JWT). JWTs are simple, digitally signed, base64 encoded JSON structures. The Analytics API would inspect the token, verify the signature, and then just trust whatever group/permission/scope/user info comes with the token. The plus side is that this removes the need for the RS to call the AS at all, meaning there is one less network hop and the various services are more resilient to transient failures in the AS. The disadvantage is that token invalidation gets a little more complicated, and services may see outdated scope information.

Cloud Foundry uses this approach with their UAA, and Salesforce and Oracle both use this form as well. Both Salesforce and Cloud Foundry started out using symmetric HMAC-SHA256 signing and later shifted to using RSA-SHA256. Since RSA-SHA256 would mean that the signing secret would only have to live in the Authorization Server, we'd almost certainly want to start with that approach unless we ran into prohibitive performance problems.

Nimisha pointed out the video included at the top of this page, where the suggestion is to return an opaque token to the untrusted end user/client, but to use AS signed JWT tokens internally between services. This gives us a bit more depth in our security.

Scopes

Scopes are a simple list of case-sensitive, space delimited permissions in OAuth 2.0. The client can request a list of scopes when asking the AS to issue an access token, but the AS can override those. If the Client doesn't ask for scopes, the AS can give a reasonable default or fail the request. There are no restrictions beyond that, and different groups have adopted very different naming practices (many ignoring the "space delimited" rule). A Heroku engineer wrote a blog post with a good survey of these approaches.

I believe that using a period for namespacing makes a lot of sense for us, in the style of Microsoft or Facebook (e.g. xqueue.submit). Google uses fully qualified URLs as scope names, which seems like overkill. One interesting idea is the Facebook example of dynamic scopes. So I could say xqueue.read_course:course-v1:edX+100+2014A or xqueue.read_problem:peerassessment. While that might have possibilities, most companies restrict themselves to a very limited number of scopes. GitHub has around 20 of them. Google has many, many more, but each individual service usually only has 2-3 (e.g. "read", "write"). Which brings us to the next point...

OAuth2 Scopes are not User Permissions

Scopes are used to define what the token bearing Client can do on behalf of the user, not what the user can do. There is always more fine-grained permissions handling in the background when it comes to the actual user, and it's not a substitute for resource-level ACLs. Right now we use OAuth2 scopes to convey course access lists in Insights, and this is already starting to bite us from a size and performance perspective.

To expand a bit more on this, say a course staff member carlos is interacting with the future LMS as an API gateway (we should split those two out, but that's another topic). He wants to see the submission history for a particular problem for user jarvis. That information is in a separate backend service that holds XBlock User State. So that looks like:

  1. LMS makes request to XBUS for the problem state history for user jarvis and sends XBUS a signed JWT access token that it got from the AS.
  2. XBUS inspects and verifies the token, and sees that the user is carlos and one of the scopes is xbus.read.
  3. At this point, if carlos was accessing his own state, XBUS could reply "sure, here you go". But carlos is asking for jarvis's data, so XBUS needs to know whether carlos is a course staff, edX admin, etc.

At this point, there are a couple of plausible strategies that come to mind:

  1. We have a separate service that XBUS can call that will return the various roles a user has in a given course (e.g. "student", "staff", "beta-tester"). Based on that information, XBUS makes its own determination as to whether carlos should be allowed to see jarvis's information.
  2. We store permissions in a more centralized manner. XBUS declares that it has some parameterized permission read_user_state_course (whether that's dynamic, or we use some naming convention), and it asks a central authorization service whether carlos has that permission for this course.
  3. We write the relevant group information to the JWT token. This might be things like "edx-staff" as well as all org-level groups and course-level groups. This may grow large over time.

Option #2 theoretically gives us more flexibility, but what little I found discussing this (an O'Reilly microservices book, a Quora post, a video presentation) describes having a centralized permissions system as extremely painful. Some quotes:

"This is a nightmare to maintain and gives very little scope for our services to have their own independent lifecycle, as suddenly a chunk of information about how a service behaves lives elsewhere, perhaps in a system managed by a different part of the organization."

"These decisions need to be local to the microservice in question."

"First, because of the complexity, services must manage their own access control. Don't try to build a central service that says "yes" or "no" to a ton of requests for different services. That's a dark, dark road to go down. Instead, ensure that a request to a service contains the requisite information required to determine if it should be answered. This leads to .. 

Second, try to have a uniform way to document and express the security requirements and responses for each service, ideally a uniform protocol so that you can debug the security interactions between two services that are not related. For example, always passing through the user ID and their group IDs in the request header. Whatever it takes.

Finally, keep separate services truly separate: don't have a central database that stores all of the permissions and is utilized by different services, or you'll be right back at my first point ... unless you're still at a point where a shared data store is scaling just fine for you, which brings me to ..."

I can't find the video right now, but there was also some discussion of the difficulty in testing a system once you get beyond a small handful of roles as most services will assume a certain uniform level of access (e.g. the Analytics API is going to assume that if you can access the student data for a course, that you're allowed to see the course content as well, and won't be built to gracefully degrade).

The service code should still express things in terms of permissions internally of course, but the mapping of roles to permissions should happen at the service level.