Table of Contents

If you have non-trivial migrations to apply, or if two non-local environments (e.g. stage and production) have different migration states, describe your situation in the #django Slack channel and go talk to the Ops SRE team before doing anything else. Similarly, migrations can become complicated when two different people create two different migrations around the same time. When in doubt, post in #django and talk to Opsto SRE.

It is often useful to review or provide the SQL generated by a migration. See sqlmigrate doc for details.

Don't revert code that includes migrations, don't change old migrations.

...

If you still think you need to change old migrations, and you want to verify that there isn't an alternative, see the "When in doubt" section.

Don't change the parent of a migration

Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master. It will cause real problems and probably downtime for which environment it is deployed to. When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master. Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.

Deployment and backward-compatible migrations

Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.

Let's go through a couple examples with our user table, auth_user. Also, if you absolutely must change old migrations (this includes things like squashing), after merging and verifying your migration changes, you should update the sql that is used to populate devstack during provisioning. This is to prevent conflicts during provisioning in the future. To update the sql, in the devstack repository, run .update-dbs-init-sql-scripts.sh . This should update edxapp.sql, edxapp_csmh.sql, and ecommerce.sql. Create a PR with the updated sql and merge to devstack as usual.

Don't change the parent of a migration

Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master. It will cause real problems and probably downtime for which environment it is deployed to. When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master. Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.

Deployment and backward-compatible migrations

Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.

Let's go through a couple examples with our user table, auth_user. It has a few different columns, but we'll use the full_name column for the examples.

...

Release:
- Add the new field to the model.
  - If the old field has null=False, blank=False, and no default:
    - If the model is used in forms (django admin, or other forums):
      - Create the new field with null=True, editable=False.
      - disabling editable removes the field from
    - else:
      - Create the new field with null=True.
  - else if the old field is a BooleanField:
    - You might need to change the old field type to NullableBooleanField so that unit tests in release 2 will be happy when the old field is removed from code but not sqlite3.
    - Create the new field with BooleanField and the same signature, assuming there's a default set.
  - else if the old field has null=true:
    - Create the new field with the same field signature as the old.
- Update any place where there are creates or updates on the field
  - Write the same value into both fields
  - If there is a Django admin page or other form and it is used regularly to create/update rows:
    - Register a signal handler to the model to update the new field whenever the old field changes or a new row is created.
Release:
- Create a data migration to copy the values from the old field into the new field.
  - If the table is large, consider disabling atomicity and batching the copy.
- Remove all references to the old field in the code.
  - Including removing the old field from the model in the code.
  - If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
  - DO NOT include the migration for removing the old column (yet).
- If you create the new field with a different field signature than the old, then update it now to be the same as the old.
  - e.g. change null back to False and editable back to True (the default).
    - CAUTION: Changing to null=False will cause a table rebuild during the ALTER. When performing this migration on a table with a large number of rows, degraded performance/downtime will likely result.
    - See 158766629 below.
  - include the migration that goes with this, but NOT the migration to remove the old field.
Release:
- Run makemigrations, this should pick up the field removal from the previous stage.
- If this change is in the edx-platform codebase, remove the skip to the test_migrations_are_in_sync unit test.

How to drop a table

TWO releases:

Remove all references to the table.
Remove the table's model (with a migration).

How to delete a Django app containing tables

See Removing a Djangoapp from an existing project

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.

Make a new field addition in model with null=True.
Generate the model-change migrations locally.
Create a pull request containing the model change and newly-generated migrations.
Merge the migration pull request.
The release process will run the migration and add the nullable column to table.

NOTES:

The column must be nullable! (null=True)
- When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
The Django ORM default for a column's null value is False.
- So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.

Add the new index on model fields.
Generate the model-change migrations locally.
Make pull request containing the model change and newly-generated migrations.
Merge the migration pull request.
Release will run the migration and add the index to table.

Mathematical perspective: Database Expansion/Contraction

...

Warning
Due to the current workings of the Open edX ecosystem, some 2U-specific required steps are part of this process to avoid unnecessary problems for 2U, and the community as a whole.

Pre-requirements:

A 2U employee must create a Data Platform 1-off request to ensure that there will be no complications due to losing the table (e.g. financial data, or otherwise), and to determine the follow-up required.
- This must happen before removing the actual tables in 2U Production.

TWO releases (after pre-requirements):

Remove all references to the table by removing references to the model and the model itself
1. If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
Remove the table with a migration
1. Important: If 2U has determined that their data can't be lost during pre-requirements above, the merge should be timed with a pipeline pause so the delete migration can be faked.
  1. This should be a rare occurrence, but it has happened.
2. Remove the skip if you added one

Once a table is removed:

A 2U employee should:
- Follow up with the Data Platform ticket to clean up the table from downstream consumers of data.
- If there is an associated app-permissions group then a 2U employee must make an app-permissions PR to remove that group.

How to delete a Django app containing tables

See Removing a Djangoapp from an existing project.

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader). Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

...

Code Block

title	Sample code from the Django docs

def combine_names(apps, schema_editor):
    # We can't import the Person model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Person = apps.get_model('yourappname', 'Person')
    for person in Person.objects.all():
        person.name = '%s %s' % (person.first_name, person.last_name)
        person.save()

Useful Checklists

Checklist for structural migrations

Existing Tables

Will the migration cause data loss?
Will this have a performance impact? NB: We do not take maintenance windows for migrations. We vastly prefer to re-engineer the migration than to schedule a maintenance window and virtually all migrations should be able to be engineered to avoid downtime by being additive-only
- Is the migration against a large table (see section below) ?
- Is the migration against a busy/highly contentious table (many writes/deletes/etc - see section below)?
  - For the community, if we anticipate a potentially significant migration, make a note on the Open edX Release page for the next release
- How long do you expect the migration to take to run? Options include naive local testing, using 158766629 or some other synthetic method that might give you a reasonable framework to guess / extrapolate from
- Do you expect the migration to block queries, particularly frequent / user facing ones?
Is the migration backward compatible? Does it remove or edit a schema that the previous version of the code expects to be there?

New Tables

Is there a primary key? How fast will the table grow? Should the primary key be a bigint (more than 4b rows)?
Is there a process for identifying and trimming expired rows? Is there an appropriate index that will prevent a full table scan during cleanup?
What are the most common queries? Are there indexes to support them?
Should there be unique constraints?

Checklist for data migrations

Is there a rollback migration? Does it correctly rollback to the previous state?
Is there a migration test? Django migration testing
How long does a rollback take to run?
Is data being loaded into this table? Is it static or dynamic? How long will it take to load?
Have we double-checked that models are not being imported directly (are we allowing historical models to be used)?

Checklist for adding indexes

How long will it take? How big is the table?
What is the read/write ratio on the table? What is the impact to writes?
Should we alter an existing index rather than add a new one (left most columns, etc)

Testing migrations

Unit testing

Migrations are currently not run in unit tests.

Acceptance tests

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

The caches are stored here in the repo: https://github.com/edx/edx-platform/tree/master/common/test/db_cache
The script that operates on the db cache to prepare it for test execution is here in the repo: https://github.com/edx/edx-platform/blob/master/scripts/reset-test-db.sh
To create an updated cache, follow the instructions on this page.

Load tests

There are caveats about using our existing load test environment as part of your migration testing. The existing load tests have limitations about what kinds of queries they generate and the data that is used in the load test environment is not representative.

TODO: Write up best practices for testing migrations in the load test environment. Requires coordination with devops - work with them to run your migration while you're generating load against it. Monitor the error rate, throughput, etc.

Known large and/or problematic tables

Large tables

Top ones (ordered by descending size):

Code Block

courseware_studentmodulehistory, courseware_studentmodule
student_historicalcourseenrollment, student_courseenrollment
student_anonymoususerid, user_api_userorgtag
django_comment_client_role_users, certificates_generatedcertificate
auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken

...

Contentious tables

Top ones (ordered by descending calls-per-minute):

Code Block

user_api_userpreference
auth_user
student_understanding
theming_sitetheme
django_site
course_modes_coursemode
courseware_studentmodule
course_overviews_courseoverview
waffle_utils_waffleflagcourseoverridemodel
edxval_videoimage
edxval_profile
completion_blockcompletion
student_anonymoususerid

...

Deployment and migrations for large or problematic tables

First read "Deployment and backward-compatible migrations" for general information about handling blue-green deployments. Then see this section for special consideration for large or problematic tables.

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with SRE.

Make a new field addition in model with null=True.
Generate the model-change migrations locally.
Create a pull request containing the model change and newly-generated migrations.
Merge the migration pull request.
The release process will run the migration and add the nullable column to table.

NOTES:

The column must be nullable! (null=True)
- When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
The Django ORM default for a column's null value is False.
- So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires SRE coordination as the migration may timeout in GoCD.

Add the new index on model fields.
Generate the model-change migrations locally.
Make pull request containing the model change and newly-generated migrations.
Merge the migration pull request.
Release will run the migration and add the index to table.

Consider Making a New Model with a OneToOneField

Adding fields to large tables can cause operational issues. What is safe varies from database version (MySQL 5.7 vs. 5.8. vs. 8.0) and specialized backends (like Aurora). Also, even if the database supports adding things in a non-locking way, Django's migrations framework may not understand how to formulate the right SQL to do so.

A lower risk alternative is to create a new model and link it together with a OneToOneField. You can use the primary_key=True option in order to have the new table's primary key match the values of the parent table.

This does complicate the code somewhat, and you should be careful about avoiding n+1 queries by calling select_related. The benefit is not having to hold your breath when the migration rolls out, for fear that you just froze a heavily used table and brought down the site.

Known large and/or problematic tables

Large tables

Top ones (ordered by descending size):

Code Block

courseware_studentmodulehistory, courseware_studentmodule
student_historicalcourseenrollment, student_courseenrollment
student_anonymoususerid, user_api_userorgtag
django_comment_client_role_users, certificates_generatedcertificate
auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken

See this spreadsheet (edX only): https://docs.google.com/spreadsheets/d/1rrRGsjYYNV41rHYLmDluQw74a8HRegnbveXaMk95gBI/edit#gid=0

Contentious tables

Top ones (ordered by descending calls-per-minute):

Code Block

user_api_userpreference
auth_user
student_understanding
theming_sitetheme
django_site
course_modes_coursemode
courseware_studentmodule
course_overviews_courseoverview
waffle_utils_waffleflagcourseoverridemodel
edxval_videoimage
edxval_profile
completion_blockcompletion
student_anonymoususerid

See this list for highly contentious tables in edx-platform (LMS/Studio): https://rpm.newrelic.com/accounts/88178/applications/3343327/datastores#/overview/MySQL?value=calls_per_minute

Useful Checklists

Checklist for structural migrations

Existing Tables

Will the migration cause data loss?
Will this have a performance impact? NB: We do not take maintenance windows for migrations. We vastly prefer to re-engineer the migration than to schedule a maintenance window and virtually all migrations should be able to be engineered to avoid downtime by being additive-only
- Is the migration against a large table (see section below) ?
- Is the migration against a busy/highly contentious table (many writes/deletes/etc - see section below)?
  - For the community, if we anticipate a potentially significant migration, make a note on the Open edX Release page for the next release
- How long do you expect the migration to take to run? Options include naive local testing, using 158766629 or some other synthetic method that might give you a reasonable framework to guess / extrapolate from
- Do you expect the migration to block queries, particularly frequent / user facing ones?
  - For example, adding/modifying indexes on non-Aurora databases
Is the migration backward compatible? Does it remove or edit a schema that the previous version of the code expects to be there?

New Tables

Is there a primary key? How fast will the table grow? Should the primary key be a bigint (more than 4b rows)? In general, use the Django default generated auto incrementing id for a primary key.
Is there a process for identifying and trimming expired rows? Is there an appropriate index that will prevent a full table scan during cleanup?
What are the most common queries? Are there indexes to support them?
Should there be unique constraints?

Checklist for data migrations

Is there a rollback migration? Does it correctly rollback to the previous state?
Is there a migration test? Django migration testing
How long does a rollback take to run?
Is data being loaded into this table? Is it static or dynamic? How long will it take to load?
Have we double-checked that models are not being imported directly (are we allowing historical models to be used)?

Checklist for adding indexes

How long will it take? How big is the table?
What is the read/write ratio on the table? What is the impact to writes?
Should we alter an existing index rather than add a new one (left most columns, etc)

Testing migrations

Unit testing

Migrations are currently not run in unit tests.

Acceptance tests

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

The caches are stored here in the repo: https://github.com/edx/edx-platform/tree/master/common/test/db_cache
The script that operates on the db cache to prepare it for test execution is here in the repo: https://github.com/edx/edx-platform/blob/master/scripts/reset-test-db.sh
To create an updated cache, follow the instructions on this page.

Common migration tasks

Making a migration to create a new table

...

Code Block
./manage.py [lms\|cms] --settings=devstack_docker makemigrations --initial name_of_app

Making a migration to modify an existing table

When you make changes to your model, create migration file and check it in:

Code Block
./manage.py [lms\|cms] --settings=devstack_docker makemigrations name_of_app

...

_app --pythonpath=.

Make sure you are pointing to the correct environment file.

Checking SQL for a migration

After creating your migration file, if you are running Open edX via the DevStack configuration, it is sometimes useful to review the SQL for your new migration:

Code Block
./manage.py [lms\|cms] --settings=devstack_docker makemigrationssqlmigrate name_of_app --pythonpath=.

...

number

Performing a migration

After creating your migration file, if you are running Open edX via the DevStack configuration, you can perform the migration using the following command:

...

Code Block
# Remove the migration file root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# rm /edx/app/edxapp/edx-platform/lms/djangoapps/courseware/migrations/0011_csm_id_bigint.py

Squashing Migrations

See Django's Documentation for Squashing Migrations. Some useful tips for squashing:

The primary benefit of squashing migrations is the speed-up of running migrations from scratch. If you are not running migrations from scratch, this may not help you.

Expand

title	Click for pros/cons of squashing...

Pros:

Processing time for running actual migrations is greatly improved, but we are almost never building from scratch (in edx-platform). Only new instances of Open edX are probably benefiting from this. It is unclear what IDAs may be running migrations before unit testing.
Ultimately, when we remove old files and unnecessary migrations, we may have less maintenance on the old migrations.

Cons:

Processing speeds seem to be unchanged (or worse) for showmigrations, or determining migrations to run in GoCD when there are no new migrations.
To get the maximum benefit of faster from-scratch migration times, a lot of careful and potentially error-prone work is required.

Conclusion:

I don’t recommend squashing unless you are starting with a clear problem to be solved, that isn’t already handled through cached databases containing earlier migrations. For example, if you happen to run migrations before unit tests, rather than running based on the models.

See ARCHBOM-1148 for more details.

Managing you squash migrations PR:
- Keep migration squashing to its own PR. Introducing a migration in the same PR that you squash can cause issues.
- Keep the auto-generated squash migration file as its own initial commit on PRs. This will help your PR reviewers.
You can sometimes get an improved squash by removing the data migrations or removing all old migrations to create fresh migrations.
- Note: commit this separately from the initial auto-generated commit to help with review.
- You may need to remove all migrations for apps that depend in your migrations as well, to get this to run.
- If you use this method, ensure makemigrations shows that there is nothing missing from your squash.

Testing your squashed migrations.

Expand

title	Click for more on testing...

Try to run all the migrations locally.
For pytest, you can use -vvv to show if the migrations are running, and a combination of --create-db and/or --enable-migrations should work.

For edx-platform:

To test locally, try:

Code Block

# Note: unit test don't currently run using migrations, but this will ensure the migrations complete.
paver test_system -s lms --enable-migrations --verbose --disable_capture

# UPDATE: This command may be deprecated/removed soon.
# Or try the following, which you can use to run the bokchoy smoke tests against:
# Note: the mysqldump command may fail locally with 'Unknown table 'COLUMN_STATISTICS' in information_schema (1109)', but
#   you should at least have seen all the migrations run successfully first.
paver update_bokchoy_db_cache

Note that almost everywhere, edx-platform has optimizations to skip migrations or run minimal migrations, so squashing doesn't provide much benefit.

Important: Squashing Migrations is a two part process, and each part needs to live in a separate Open edX Named Release in order for the community to get caught up before the second part is released. From Django's docs:

...

Page Comparison

Versions Compared

Old Version 142

New Version Current

Key

Don't revert code that includes migrations, don't change old migrations.

Don't change the parent of a migration

Deployment and backward-compatible migrations

Don't change the parent of a migration

Deployment and backward-compatible migrations

How to drop a table

How to delete a Django app containing tables

How to add a nullable column to an existing table (AWS Aurora)

How to add index to existing table (AWS Aurora)

Mathematical perspective: Database Expansion/Contraction

How to delete a Django app containing tables

Mathematical perspective: Database Expansion/Contraction

Useful Checklists

Checklist for structural migrations

Existing Tables

New Tables

Checklist for data migrations

Checklist for adding indexes

Testing migrations

Unit testing

Acceptance tests

Load tests

Known large and/or problematic tables

Large tables

Contentious tables

Deployment and migrations for large or problematic tables

How to add a nullable column to an existing table (AWS Aurora)

How to add index to existing table (AWS Aurora)

Consider Making a New Model with a OneToOneField

Known large and/or problematic tables

Large tables

Contentious tables

Useful Checklists

Checklist for structural migrations

Existing Tables

New Tables

Checklist for data migrations

Checklist for adding indexes

Testing migrations

Unit testing

Acceptance tests

Common migration tasks

Making a migration to create a new table

Checking SQL for a migration

Performing a migration

Squashing Migrations