Components:
- ElasticSearch (ES) for storing notes, the same instance that powers forums
edx-notes additional django app in edx-platform LMS
implements new views: student notes and search results
responsible for loading of annotator JS and CSS assets for pages that need them
use decorator to make components annotatable
- edx-notes-api a standalone service
supports CRUD operations on notes, serving as an interface to ES
accepts authorized LMS users, passing username to ES
Django app that reuses parts of annotator-store and authenticates users with OAuth2
(annotator-store authentication is not Oauth, just similar, so we need to make it work with edx oauth)- in the future, having standalone Notes API service will let students build their own mashups, pull archives etc.
protocol details
- unit id is used as URI
- django username is userd as user identifier
interaction
- edx-notes-api accepts HTTP requests accompanied valid Oauth2 token
- LMS backend gets the token from edx-oauth2-provider as Trusted Client and passes it to the frontend
- user's browser sends requests to edx-notes-api with token in headers
- edx-notes-api validates the token signature
Data Store
We plan to use ElasticSearch 1.4 as both a primary data store and for search capabilities. Motivating this decision:
- Interface Fit: CRUD, simple filtering (by owner), and fast text search are the three primary access cases for notes data, all of which ES handles well.
- Operational Fit: ElasticSearch is already deployed in production at edx.org and installed as a dependency of existing OpenedX installations. It is both highly scalable and also very accessible at small / development scale. Masterless clustering model.
- Simplicity: using ES as both document store and search index means a single external service interface/dependency for persistence, and does not require a replication process or admin tooling to maintain synchronization between multiple stores.
- Expediency: we are leveraging a third-party reference implementation for the notes backend, which is itself implemented against ElasticSearch as sole data store, thus very little customization is needed.
This selection carries some risk, on account of the following:
- ES's primary use case is for search indexing with an external source-of-truth database; durability is not an original design priority.
- In a somewhat well-publicized analysis on aphyr.com, ES was shown to be prone to losing (acknowledged) writes under certain network partitioning conditions.
- The list of companies presently using ES as a primary data store, at scale, is not long.
ES In The Wild
(WIP: in progress)
Testing:
- edx-notes, being part of LMS is going to be tested with regular Jenkins
- edx-notes-api, being standalone and fresh, can be tested with Travis
Todo
- edx-notes-API needs ElasticSearch 1.0+, while we do not have it deployed yet
- need to upgrade ElasticSearch in production