Table of Contents |
---|
TL;DR:
We can get the same things we are using from elasticsearch from MySQLElasticsearch from MySQL. We do not make full use of of the Elasticsearch product. We likely get a performance enhancement from being able to perform quick searches, but I do not see a strong argument for continuing the use of Elasticsearch given its limited use by Insights and the analytics data API.
- Parameter validation can be performed by Python Django code.
- Elasticsearch queries can be replaced by MySQL queries.
How Do We Use Elasticsearch in Insights?
Key Concepts
Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.
An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure.
(source)
High Level
Insights does not make heavy use of Elasticsearch. Insights relies on Django views in the analytics-data-api that are backed by Elasticsearch. These views are LearnerView
and LearnerListView
. This views back the Learner view in Insights. See an example in Insights here for the demo course.
Low Level
The edx-analytics-data-api/analytics_data_api/v0/documents.py
file defines two classes, RosterUpdate and RosterEntry, that inherit from the Document class provided by the elasticsearch-dsl library. The Document class is a model-like wrapper around the Elasticsearch document. It allows us to define Elasticsearch mappings, which are associations between a field in a document and the field's "type".
RosterUpdate
RosterUpdate is a Document that stores when the index was last updated.
RosterEntry
RosterEntry is a Document that stores information about a learner with respect to a course, including fields like course_id
, user_id
, problems_attempted
, problems_completed
, etc. RosterEntry
has two class methods that implement Elasticsearch search queries, get_course_user
and get_users_in_course
. These will be discussed below.
The LearnerView and the LearnerListView Django views use these Document classes, particularly the above class methods, when fetching data from Elasticsearch.
Nitty Gritty Details
get_course_user
does a query for a given course_id
and username
.
...
Expand | ||
---|---|---|
| ||
|
How can we write equivalent MySQL?
Let us assume that we have a MySQL table learner_activity
with the following fields, which are fields on the existing RosterEntry
document.
...
- I have not looked into how to parameterize the SQL or make it dynamically generated given a set of parameters. I’m assuming Django has this functionality even without Django models.
Odds and Ends
The elasticsearch RosterEntry document contains a field attempt_ratio_order
. It’s used to make ordering by the problem_attempts_per_completed
more correct. problem_attempts_per_completed
can be infinite if no attempts were completed. My understanding is this is stored in the database as null
. The comments say the following.
...