Courseware Student Module (CSM) Replacement

The courseware_studentmodule table and courseware_studentmodulehistory table are monolithic MySQL tables in edx-platform. Their data currently grows without bound - and queries to them can be slow. We're embarking on a project to offer the platform option to use an alternate DB in which to store this data. The new backend should be able to handle the large data size along with all the read/write requirements.

Initial Discovery Work

Dave Ormsbee performed the exploratory work on this topic - those details are in this page: Discovery: Transitioning XBlock User State Away from CSM

Also, an initial CSM plan was put together here: https://docs.google.com/document/d/1FwPc9VnQKrXRjzAGDTbRlBnpwE9-KMELkvnOfyT-Was/edit

Call Catalog

Julia Eskew (Deactivated) documented all CSM calls in edx-platform on this page: Courseware StudentModule (CSM) Call Catalog

Courseware StudentModule History

In the courseware_studentmodulehistory table, a new row is added for each update and insert to the courseware_studentmodule table which is for a CAPA problem. At the moment, it keeps the history forever. The table's data is used by the course team for support purposes. The data is viewable as a "submission history" button attached to the problem in the courseware. The table data has also been used for development purposes to diagnose bugs - and, on rare occasions, to correct state-corrupting issues in production.

DevOps Parallel Work

DevOps is a partner in this work. They've created an Epic under which their CSM work is tracked - it's here:  OPS-477 - Getting issue details... STATUS

edx.org CSM Requirements

Current

The CSM/CSMH solution that's to be implemented needs to meet several performance requirements. The read/write requirements are determined by viewing New Relic data on query volume. Here's an example of that data:

https://rpm.newrelic.com/accounts/88178/applications/3343327/datastores#/table/MySQL

The relevant "courseware_" table queries can be seen on that page. Click through each query and peak throughputs can be seen.

Searching through recent New Relic data, the SELECT/INSERT/UPDATE peaks in this note were found:

https://rpm.newrelic.com/accounts/88178/notes/4590

and were:

QueryRate (calls/min)Rate (call/sec)
SELECT (before progress optimization)43.6K727
SELECT (after progress optimization) (9/15/15)11.6K193
UPDATE (6/17/15)5.41K90
INSERT (6/2/15)2.5K42

<Insert information on data size requirements>

Projected (Future)

<Insert information on future query rate and data size requirements>