CSM - gh-ost Runbook
See PLAT-2437. This is a runbook for running gh-ost to alter the id field of courseware_studenthistory in edx-platform as safely and with as little downtime as possible.
Questions we need to answer before running this:
What changes do we need to make to support both versions of CSMH? None that I can tell.
Removing the FKs will likely mean that we need to use the integer IDs directly instead of using StudentModule objects, for example. We don't need to actually remove the FKs on the Django models, we can just set them to db_constraint=False, db_index=True and remove the actual FKs in the database. Django seems to handle to this just fine.
Also manually handling deletes if they're cascading right now. It seems like Django handles the cascading deletes so nothing needs to be done here.
Before or after we upgrade CSM we will also need to migrate both CSMH's to have an appropriately sized column for CSM fake foreign keys. Does needing to perform maintenance that more than doubles the size of that table cause us to re-evaluate this plan? This is a question for @Joe Mulloy (Deactivated).
What do we need to do to ensure that Django migrations continue to work throughout this process? Especially after we switch over to the new table?
Will Django notice that the column type has changed? No.
Can we fake a migration that gets Django's state back in sync with reality? Yes, though it seems like if we have already dropped the FKs it's a SQL no-op anyway.
At what point do we do that for each of the tables that matter (courseware_studentmodule, courseware_studentmodulehistory, courseware_studentmodulehistoryextended)? Adding this to the runbook.
Per-environment runbook (in progress)
Have we pruned
coursewarehistoryextended_studentmodulehistoryextended? Should we?In MySQL make sure we have dropped the FK from
courseware_studentmoduletocourseware_studentmodulehistoryThis is the unused "local" table, not the extended one in it's own database.
It is confirmed to be empty on stage, edge, and prod read replicas. Not sure if they are actually empty on masters.
In MySQL alter
courseware_studentmodulehistoryto have a BIGINTstudent_module_idAgain this should be empty and unused
Do test gh-ost runs for
courseware_studentmoduleUse the gh-ost cheat sheet for "Connect to replica, migrate on master": https://github.com/github/gh-ost/blob/master/doc/cheatsheet.md#a-connect-to-replica-migrate-on-master
Confirm the environment is set up correctly
Is using Statement Based Replication
Replica we want to use is configured with binary logs enabled (
log_bin,log_slave_updates)binlog_format=ROW(gh-ostcan apply the latter for you)
Confirm the RDS environment has enough resources for two copies of the table!
Perform a no-op test run to make sure everything is set up correctly
The "alter" flag we want to run is: "
change id bigint unsigned not null auto_increment"
Test run the migration on a replica only
Monitor and see if the throttling variables need to be tweaked
Confirm that the data all looks good and makes sense
Do test gh-ost runs for
coursewarehistoryextended_studentmodulehistoryextended(the one in its own database)This can be done at the same time as the studentmodule tests to save time
The playbook is the same as studentmodule except for the "alter" flag, database, and table to run on):
The "alter" flag we want to run is: "
change student_module_id bigint unsigned not null"
Run the
courseware_studentmodulemigration on the masterMonitor and tweak throttling as needed
We should use the cut over flag file to make sure we are in the office and ready when it comes time to flip the switch: https://github.com/github/gh-ost/blob/master/doc/command-line-flags.md#postpone-cut-over-flag-file
Confirm that the data looks good
Cut over to the new table and monitor
Run the
coursewarehistoryextended_studentmodulehistoryextendedmigration on the masterThis can also be done at the same time as the studentmodule migration to save time
Play is the same as studentmodule
Trickle delete the rows from the old
coursewar e_studentmoduletable, wait some time, then drop it: https://github.com/github/gh-ost/issues/307We will likely want to create a custom script / management command for the deletes
Trickle delete the rows from the old
coursewarehistoryextended_studentmodulehistoryextendedtable, wait some time, then drop it: https://github.com/github/gh-ost/issues/307Again this can be done at the same time as the deletes on studentmodule
We will likely want to create a custom script / management command for the deletes
Create Django migrations and fake them to get Django up to date with the changes:
StudentModule model:
ADD
id = UnsignedBigIntAutoField(primary_key=True) # pylint: disable=invalid-name
This replaces the default Django id
This also includes the ALTER for StudentModuleHistory
StudentModuleHistory model:
CHANGE
student_module = models.ForeignKey(StudentModule, db_index=True, db_constraint=False, on_delete=models.CASCADE)
StudentModuleHistoryExtended model:
Does not seem to need / create a migration or changes to the model definition
Once these migrations are merged we should follow the usual procedure for faking migrations. We have one of those, right? It looks like we did some for Ginkgo.
Profit