Note |
---|
Adding a foreign key to an existing large table is a complex process that needs to be done very carefully to avoid downtime. |
...
Some of the operations that incur downtime can be problematic even for smaller existing tables (adding primary keys etc).
Online DDL Operations, Instant DDL (Aurora), Fast DDL (these cover roughly the same concepts) are key for performance and avoiding downtime.
Directly adding a foreign key incurs downtime.
Making a foreign key non-nullable incurs downtime.
Adding a new nullable column is fine.
Adding an index to an existing column where all values are NULL can be done without downtime using
ALGORITHM=INPLACE, LOCK=NONE
. Make sure there are no non-null values.While the database may add these parameters by default, add them explicitly, to make sure they are not silently ignored if they are not supported.
When doing bulk updates, like backfilling values in management commands, you need to avoid reading or updating all records at once to avoid loading everything into memory or performing a slow query.
Use batching for reading and writing
MySQL does not support streaming. Thus, if you use
<queryset>.iterator()
, it will ignore it and still load all rows into memory without batching. Instead, use something like thebatch_by_pk
method in the edx-enterprise repo.Querysets are evaluated at some point, loading the contents into memory / performing the SQL query. Understand when they are evaluated and make sure they’re not evaluated prematurely.
Use
transaction.atomic
.Check generated SQL queries. With bulk_update, for example, you generate a pretty big query with a lot of WHEN/CASE statements. That is fine, but make sure that the query size does not exceed permitted query size for the db.
Coordinate with 2U SRE when releasing any tricky migrations to edx-platform.
Recommended steps
Panel | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
This work has not been tested on stage+prod yet, so follow at your own risk. I’ll update this soon. |
...
Add new INT field first, all NULL values
Create index on top of new field with
ALGORITHM=INPLACE, LOCK=NONE
Backfill values in batches for field via management command
Adjust save() method, django signals, or something like that to make sure that new records will not receive NULL values before next step
Ensure there are no more NULL values in the db field
Add foreign key to field with
ALGORITHM=INPLACE, LOCK=NONE
. It still needs to be nullable.Add any desired constraints to prevent null values to the django code, but not to the database.
Adding the foreign key constraint itself for nullable values requires wrapping the SQL in
SET FOREIGN_KEY_CHECKS = 0;
andSET FOREIGN_KEY_CHECKS = 1;
to temporarily disable checks on foreign key links to referenced model so that the inplace algorithm is supported.
...