Courseware Student Module (CSM) Replacement
The courseware_studentmodule
table and courseware_studentmodulehistory
table are monolithic MySQL tables in edx-platform. Their data currently grows without bound - and queries to them can be slow. We're embarking on a project to offer the platform option to use an alternate DB in which to store this data. The new backend should be able to handle the large data size along with all the read/write requirements.
Initial Discovery Work
Dave Ormsbee (Deactivated) performed the exploratory work on this topic - those details are in this page: Discovery: Transitioning XBlock User State Away from CSM
Also, an initial CSM plan was put together here: https://docs.google.com/document/d/1FwPc9VnQKrXRjzAGDTbRlBnpwE9-KMELkvnOfyT-Was/edit
Call Catalog
Julia Eskew (Deactivated) documented all CSM calls in edx-platform on this page: Courseware StudentModule (CSM) Call Catalog
Courseware StudentModule History
In the courseware_studentmodulehistory
table, a new row is added for each update and insert to the courseware_studentmodule
table which is for a CAPA problem. At the moment, it keeps the history forever. The table's data is used by the course team for support purposes. The data is viewable as a "submission history" button attached to the problem in the courseware. The table data has also been used for development purposes to diagnose bugs - and, on rare occasions, to correct state-corrupting issues in production.
DevOps Parallel Work
DevOps is a partner in this work. They've created an Epic under which their CSM work is tracked - it's here: - OPS-477Getting issue details... STATUS
edx.org CSM Requirements
Current
The CSM/CSMH solution that's to be implemented needs to meet several performance requirements. The read/write requirements are determined by viewing New Relic data on query volume. Here's an example of that data:
https://rpm.newrelic.com/accounts/88178/applications/3343327/datastores#/table/MySQL
The relevant "courseware_" table queries can be seen on that page. Click through each query and peak throughputs can be seen.
Searching through recent New Relic data, the SELECT/INSERT/UPDATE peaks in this note were found:
https://rpm.newrelic.com/accounts/88178/notes/4590
and were:
Query | Rate (calls/min) | Rate (call/sec) |
---|---|---|
SELECT (before progress optimization) | 43.6K | 727 |
SELECT (after progress optimization) (9/15/15) | 11.6K | 193 |
UPDATE (6/17/15) | 5.41K | 90 |
INSERT (6/2/15) | 2.5K | 42 |
<Insert information on data size requirements>
Projected (Future)
<Insert information on future query rate and data size requirements>