Background

Type 1: Platform Secrets/Credentials

At edX, we store and manage our platform secrets (API Keys, database credentials, server secret keys, etc) by keeping them in separate confidential GIT repositories, away from application code. We do not persist them in databases/logs/etc that reside in the application domain. We send them over-the-wire securely (via SSL, LastPass, etc). We follow the principle of least privilege, limiting their access to only devOps and authorized parties.

Type 2: Application Domain edX Credentials

Shared secrets in the edX application domain, related to edX learner credentials (passwords, OAuth/JWT tokens, etc) and edX partner credentials (LTI keys, OAuth/JWT tokens, etc), are stored in their relevant application databases. Of course, passwords are first transformed through a one-way password-based-key-derivation function before they are stored (and never stored in the clear). Currently, edX employees with access to production database replicas will have access to some/all of these secrets.

Problem

Type 3: Application Domain 3rd Party Credentials

At times, an edX application may need to act as an intermediary service that makes requests to a 3rd party server on behalf of a user, requiring access to the user's 3rd party credentials. Whenever possible, the OAuth 2.0 protocol should be used in these cases as it can support proper authorization, key revocation, etc, when implemented thoroughly/correctly. However, if the 3rd party service doesn't support OAuth 2.0 or the edX application needs to operate in a background service on behalf of the user, then an alternative approach using shared secrets may be required.

In this case, we need to follow best practices for securing the user's 3rd party credentials while they are in transit (sent over-the-wire, via APIs, etc) and at rest (persisted in databases/logs/etc).

Security Requirements

In Transit
- Whenever credentials are in transit (sent over-the-wire, passed via APIs, etc.), they need to be encrypted and kept confidential from eavesdroppers. This can be done via point-to-point encryption (such as via SSL).
At Rest
- Minimize the number of places the credentials are stored. Ideally we wouldn't store them at all and only pass them through. However, when the application needs to do so, persist them in only one location, to minimize the points of vulnerability in our system.
- Encrypt the credentials when they are stored at rest. This provides an additional tier of protection in case our system is ever compromised. It also prevents exposure to internal users who have read access to our database content.
Access
- Enforce appropriate application-level access control so credentials are correctly associated with their owners and not accessible by other users in the system.
- If possible, prevent read access to these credentials once they are set - to prevent any accidental leakage. To be clear, the application can still allow overriding credentials with new values, but does not need to reveal past values.
- Given the above, it may become a requirement to allow multiple values/versions of a user's credential as most secure key-storage systems allow re-keying capabilities. Application engineers can keep this in mind and consider providing this capability from the outset.

Legal Requirements

From the legal perspective, it is most preferable to use an external key storage service to store 3rd party credentials. This minimizes edX' legal liabilities in the event of a security compromise of our system.
If using an external service is not possible, follow the security requirements above.

Non-Requirements

Integrity-protecting the credentials is not a necessary requirement as we are not as concerned about tamper-detection. However, most encryption services/libraries provide integrity protection out-of-the-box and so the added protection does not add much additional effort.
It is not necessary to keep the credentials encrypted while they are in-memory - as the OS security boundaries provides sufficient compartmentalization. However, note the possible vulnerabilities in the event of a hardware crash and leakage of the keys with paging.

External Secret Storage Services

AWS' KMS service seems to be the only established externally hosted service for storing secrets. However, its current APIs do not really provide a way for us to store arbitrary keys/credentials. Its CreateKey endpoint does have a way to insert raw-key-material. However, that is only for creating Master Keys, which cannot be later exported. We can use its service to encrypt arbitrary data but then still have to manage the storage ourselves - which doesn't satisfy the legal concerns.

	AWS KMS
Company	Amazon
Storage of user data	Stores only user-provided master keys
Storage/generation of keys	Generates data-encryption keys only
API	Limited API
Key Versioning	?
Encryption as a service
Auditing	Keeps track of key usages
On-disk encryption (on backend servers)
Revocation
Cost	?
Open Source

Custom Key Storage Solutions

Alternatively, we can implement our own solution using/maintaining our own Master Key and using python/django libraries to encrypt database fields.

django-fernet-fields (recommended)
1. uses pyca/cryptography open source library
  1. 136 contributors, relatively active, maintained
  2. Fernet symmetric key encryption scheme
2. very easy-to-use extension on top of django fields
3. readable/understandable code
4. can provide custom master key, rather than relying on a single SERVER_KEY
5. allows for key rotation of the master key by listing older keys for decryption
6. only 5 contributors; now just in maintenance mode (upgrading libraries)
Django Extensions' Encrypted Fields
1. uses Google's keyczar open source library, which has known security issues, but may not apply to our usage of it
2. code is somewhat cryptic - mostly due to keyczar's interface
3. 327 contributors, active overall development
django-encrypted-fields
1. uses Google's keyczar open source library, which has known security issues, but may not apply to our usage of it
https://github.com/lanshark/django-encrypted-model-fields, uses Python's crypto library, but is a forked version of the original.

Architecture and Engineering

Storing (3rd party) Secrets