Everything About Database Migrations

This document contains information on what you need to know about Django Migrations in the Open edX platform.  If you are unfamiliar with database migrations, or, specifically, Django migrations, please read the reference documentation.  Django does a good job of abstracting away what's behind the scenes when you run "./manage.py makemigrations" and "./manage.py migrate"- understanding what happens during these operations is extremely useful.

Reference documentation

When in doubt

If you have non-trivial migrations to apply, or if two non-local environments (e.g. stage and production) have different migration states, describe your situation in the #django Slack channel and go talk to the Ops team before doing anything else.  Similarly, migrations can become complicated when two different people create two different migrations around the same time.  When in doubt, post in #django and talk to Ops.

Don't revert code that includes migrations, don't change old migrations.

Django migrations should be considered "applied" as soon as they land on master of a repo. Missing ( ghost ) migrations cause problems for Django, and require manual intervention to fix.  Fix forward on migrations. Be sure to properly consider all the points below so that you're less likely to want to delete or change a migration.  If you do delete a migration, or revert a commit that contains a migration, follow this guide to communicating with the organization and the community: How to revert a migration from master.  Also, you should never roll back migrations (manually) without rolling back the code that relies on them.

If you still think you need to change old migrations, and you want to verify that there isn't an alternative, see the "When in doubt" section.

Don't change the parent of a migration

Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master.  It will cause real problems and probably downtime for which environment it is deployed to.  When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master.  Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.

Deployment and backward-compatible migrations

Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.

Let's go through a couple examples with our user table, auth_user. It has a few different columns, but we'll use the full_name column for the examples.

Say we decide to change the column's name from full_name (with an underscore) to fullname (no underscore). Our code in production is using full_name. When it's time to deploy this new release, we simply generate a migration and deploy it. Since we are using blue-green deployments, our old code is still looking for the original column name, full_name. However, the new deployment changed the name to fullname, so the original code starts failing.

Instead of renaming the column, say we delete it completely. Again, the database is modified when we deploy, and the original code that is still running will fail.

Because we operate in an environment where new and old code are running simultaneously against the same database, new code must always be compatible with the older database schema. Newer deployments can add tables and columns, but neither can be deleted unless the old code is no longer referencing the deleted tables or columns.

Migration Unit Test

In the edx-platform codebase there is a unit test test_migrations_are_in_sync in test_db.py which ensures that django migrations and models are in sync. Migrations to drop columns or tables generally require at least two releases, one which removes references and one which has the drop migration. The first release will fail the unit test. For this reason you will need to skip that unit test during your release sequence and restore it when you are done. This also applies to libraries used by edx-platform, such as edx-proctoring, the test will fail when edx-platform receives the interim version.

The skip should include a ticket number and brief info on what it's for:

    @unittest.skip(
        "Temporary skip for TICKET-1234 while the fnord column is removed from the snood table"
    )

How to drop a column

Nullable/Non-Nullable Columns

For either a nullable or non-nullable column, first make sure there are no other models or code that actually use the column. If there are, make adjustments to those first before working through the deletion flow.

For NULLABLE columns, this involves TWO releases:

  1. Remove all usages of the column, including updating the model to not refer to the field/column anymore (i.e. Model field must be removed in this step)
    1. If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
  2. Drop the column (with a migration).
    1. If this change is in the edx-platform codebase, remove the skip to the test_migrations_are_in_sync unit test.

For NOT-NULLABLE columns, this involves THREE releases:

  1. Update the model and generate a migration making the column nullable (`null=True`)
  2. Remove all usages of the column, including updating the model to not refer to the field/column anymore (i.e. Model field must be removed in this step)
    1. If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
  3. Drop the column (with a migration).
    1. If this change is in the edx-platform codebase, remove the skip to the test_migrations_are_in_sync unit test.

Returning to our example with the auth_user table. If we still want to drop the full_name column, we should do the following:

  1. Remove every usage of the full_name column in our codebase. Skip the unit test. Release that change to production, and ensure older code is no longer running. (We once had a stale ASG in production a few hours after a release, and it caused a few issues when we dropped a column.)
  2. Create a database migration to drop the column. Restore the unit test. Release it.
  3. (This step intentionally left bank...because nothing broke in production!)

ManyToManyField Columns

When dropping ManyToManyField columns, consider that the Django ORM uses a complicated automatic mapping to map the field to certain model names. So unlike other columns where it's easy to remove all usages of the column, unexpected column usages can still occur via the Django ORM manager (such as django.db.models.fields.related_descriptors.create_reverse_many_to_one_manager). So the field removal and the migration should be two separate steps.

So - for dropping ManyToManyField columns, use at least TWO releases:

  1. Remove the ManyToManyField field from the model, while skipping the test_migrations_are_in_sync unit test. Deploy.
  2. Add the migration which actually removes the ManyToManyField from the DB (which is actually implemented via a separate table) and unskip the test_migrations_are_in_sync unit test. Deploy.

Failing to separate the two steps may result in the field being used by the old code during the blue-green deployment after the migration has been performed, resulting in production errors.

How to rename a column

Renaming a column while keeping the business logic fully functional and without taking any down time is a very delicate and complex process.  Some things to keep in mind before you start:

  1. Do not allow downtime or alter business logic between releases.
  2. Do not allow downtime or alter business logic during a release, i.e. after migrations and before code deployment.
  3. Do not allow any data to be permanently dropped, even if only a subset of the data.
  4. Every release must have a functional rollback plan.
  5. As best as possible, avoid releases that must be immediately followed up by another release.  It should be safe to walk away from the rollout halfway through (e.g. code freezes, vacations, etc. might stop work).

THREE releases:

  1. Release:
    • Add the new field to the model.
      • If the old field has null=False, blank=False, and no default:
        • If the model is used in forms (django admin, or other forums):
          • Create the new field with null=True, editable=False.
          • disabling editable removes the field from
        • else:
          • Create the new field with null=True.
      • else if the old field is a BooleanField:
        • You might need to change the old field type to NullableBooleanField so that unit tests in release 2 will be happy when the old field is removed from code but not sqlite3.
        • Create the new field with BooleanField and the same signature, assuming there's a default set.
      • else if the old field has null=true:
        • Create the new field with the same field signature as the old.
    • Update any place where there are creates or updates on the field
      • Write the same value into both fields
      • If there is a Django admin page or other form and it is used regularly to create/update rows:
        • Register a signal handler to the model to update the new field whenever the old field changes or a new row is created.
  2. Release:
    • Create a data migration to copy the values from the old field into the new field.
      • If the table is large, consider disabling atomicity and batching the copy.
    • Remove all references to the old field in the code.
      • Including removing the old field from the model in the code.
      • If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
      • DO NOT include the migration for removing the old column (yet).
    • If you create the new field with a different field signature than the old, then update it now to be the same as the old.
      • e.g. change null back to False and editable back to True (the default).
      • include the migration that goes with this, but NOT the migration to remove the old field.
  3. Release:
    • Run makemigrations, this should pick up the field removal from the previous stage.
    • If this change is in the edx-platform codebase, remove the skip to the test_migrations_are_in_sync unit test.

How to drop a table

TWO releases:

  1. Remove all references to the table.
  2. Remove the table's model (with a migration).

How to delete a Django app containing tables

See Removing a Djangoapp from an existing project

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.

  1. Make a new field addition in model with null=True.
  2. Generate the model-change migrations locally.
  3. Create a pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. The release process will run the migration and add the nullable column to table.

NOTES:

  • The column must be nullable! (null=True)
    • When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
  • The Django ORM default for a column's null value is False.
    • So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
  • If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
  • If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.

  1. Add the new index on model fields.
  2. Generate the model-change migrations locally.
  3. Make pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. Release will run the migration and add the index to table.

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader).  Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.

Data migrations

If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.

Sample code from the Django docs
def combine_names(apps, schema_editor):
    # We can't import the Person model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Person = apps.get_model('yourappname', 'Person')
    for person in Person.objects.all():
        person.name = '%s %s' % (person.first_name, person.last_name)
        person.save()

Useful Checklists

Checklist for structural migrations

Existing Tables

  • Will the migration cause data loss?
  • Will this have a performance impact? NB: We do not take maintenance windows for migrations. We vastly prefer to re-engineer the migration than to schedule a maintenance window and virtually all migrations should be able to be engineered to avoid downtime by being additive-only
    • Is the migration against a large table (see section below) ?
    • Is the migration against a busy/highly contentious table (many writes/deletes/etc - see section below)?
      • For the community, if we anticipate a potentially significant migration, make a note on the Open edX Release page for the next release
    • How long do you expect the migration to take to run? Options include naive local testing, using Everything About Database Migrations#Loadtesting or some other synthetic method that might give you a reasonable framework to guess / extrapolate from
    • Do you expect the migration to block queries, particularly frequent / user facing ones?
  • Is the migration backward compatible? Does it remove or edit a schema that the previous version of the code expects to be there?

New Tables

  • Is there a primary key? How fast will the table grow? Should the primary key be a bigint (more than 4b rows)?
  • Is there a process for identifying and trimming expired rows? Is there an appropriate index that will prevent a full table scan during cleanup?
  • What are the most common queries? Are there indexes to support them?
  • Should there be unique constraints?

Checklist for data migrations

  • Is there a rollback migration?  Does it correctly rollback to the previous state?
  • Is there a migration test? Django migration testing
  • How long does a rollback take to run?
  • Is data being loaded into this table? Is it static or dynamic? How long will it take to load?
  • Have we double-checked that models are not being imported directly (are we allowing historical models to be used)?

Checklist for adding indexes

  • How long will it take? How big is the table?
  • What is the read/write ratio on the table? What is the impact to writes? 
  • Should we alter an existing index rather than add a new one (left most columns, etc)

Testing migrations

Unit testing

Migrations are currently not run in unit tests.

Acceptance tests

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

Load tests

There are caveats about using our existing load test environment as part of your migration testing. The existing load tests have limitations about what kinds of queries they generate and the data that is used in the load test environment is not representative. 

TODO: Write up best practices for testing migrations in the load test environment. Requires coordination with devops - work with them to run your migration while you're generating load against it. Monitor the error rate, throughput, etc.

Known large and/or problematic tables

Large tables

  • Top ones (ordered by descending size):

    courseware_studentmodulehistory, courseware_studentmodule
    student_historicalcourseenrollment, student_courseenrollment
    student_anonymoususerid, user_api_userorgtag
    django_comment_client_role_users, certificates_generatedcertificate
    auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken
  • See this spreadsheet (edX only): https://docs.google.com/spreadsheets/d/1rrRGsjYYNV41rHYLmDluQw74a8HRegnbveXaMk95gBI/edit#gid=0

Contentious tables

Common migration tasks

Making a migration to create a new table

  1. Create a new directory under djangoapps and create a models.py file within it describing your model fields (example: common/djangoapps/track/models.py).
  2. Add the name of your module to INSTALLED_APPS in the appropriate environment file. Do NOT run manage.py syncdb (recommended in the Django documentation).

Once you are happy with how your fields are defined in models.py, run the following command. The resulting file will be checked in with your PR.

./manage.py [lms|cms] --settings=devstack_docker makemigrations --initial name_of_app


Making a migration to modify an existing table 

When you make changes to your model, create migration file and check it in:

./manage.py [lms|cms] --settings=devstack_docker makemigrations name_of_app --pythonpath=.

Make sure you are pointing to the correct environment file.

Performing a migration

After creating your migration file, if you are running Open edX via the DevStack configuration, you can perform the migration using the following command:

./manage.py [lms|cms] --settings=devstack_docker migrate name_of_app

Rolling back a migration

./manage.py [lms|cms] --settings=devstack_docker migrate name_of_app <number>

where <number> is the prefix of the migration file that you want to roll back to.

Unapply all migrations

There's a special "zero" migration name to unapply all migrations, including the initial migration.

./manage.py [lms|cms] --settings=devstack_docker migrate name_of_app zero

Rare migration tasks

Faking migrations

Example for CSM primary key to bigint migration.

Do the following before merging/deploying the code, otherwise the pipeline will try to run the migrations

Copy the migration file onto a machine, can be a worker if the app has worker machines

# Copy file contents into /edx/app/edxapp/edx-platform/lms/djangoapps/courseware/migrations/0011_csm_id_bigint.py on worker machine
# Check that the migration shows up in the list as unapplied

root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# /edx/bin/edxapp-migrate-cms --noinput --list courseware
sudo: unable to resolve host ip-10-3-71-92
WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported
  warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning)

2019-08-30 18:08:47,084 WARNING 11141 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider
2019-08-30 18:08:47,084 WARNING 11141 [enterprise.utils] [user None] utils.py:56 - cannot import name _LTI_BACKENDS
courseware
 [X] 0001_initial
 [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration
 [X] 0003_auto_20170825_0935
 [X] 0004_auto_20171010_1639
 [X] 0005_orgdynamicupgradedeadlineconfiguration
 [X] 0006_remove_module_id_index
 [X] 0007_remove_done_index
 [X] 0008_move_idde_to_edx_when
 [X] 0009_auto_20190703_1955
 [X] 0010_auto_20190709_1559
 [ ] 0011_csm_id_bigint
sudo: unable to resolve host ip-10-3-71-92
WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported
  warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning)

2019-08-30 18:08:52,683 WARNING 11392 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider
2019-08-30 18:08:52,684 WARNING 11392 [enterprise.utils] [user None] utils.py:56 - cannot import name _LTI_BACKENDS
courseware
 [X] 0001_initial
 [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration
 [X] 0003_auto_20170825_0935
 [X] 0004_auto_20171010_1639
 [X] 0005_orgdynamicupgradedeadlineconfiguration
 [X] 0006_remove_module_id_index
 [X] 0007_remove_done_index
 [X] 0008_move_idde_to_edx_when
 [X] 0009_auto_20190703_1955
 [X] 0010_auto_20190709_1559
 [ ] 0011_csm_id_bigint
root@ip-10-3-71-92:/edx/app/edxapp/edx-platform#

Run Migration Fake

# Add extra space in front to prevent bash from writing password to history
root@ip-10-3-71-92:/edx/app/edxapp/edx-platform#  DB_MIGRATION_USER=migrate001 DB_MIGRATION_PASS=redacted /edx/bin/edxapp-migrate-cms --fake courseware 0011_csm_id_bigint

# Confirm the migration was applied
root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# /edx/bin/edxapp-migrate-cms --list courseware
sudo: unable to resolve host ip-10-3-71-92
WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported
  warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning)

2019-08-30 18:55:31,993 WARNING 19468 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider
2019-08-30 18:55:31,993 WARNING 19468 [enterprise.utils] [user None] utils.py:56 - cannot import name EnterpriseCustomerIdentityProvider
courseware
 [X] 0001_initial
 [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration
 [X] 0003_auto_20170825_0935
 [X] 0004_auto_20171010_1639
 [X] 0005_orgdynamicupgradedeadlineconfiguration
 [X] 0006_remove_module_id_index
 [X] 0007_remove_done_index
 [X] 0008_move_idde_to_edx_when
 [X] 0009_auto_20190703_1955
 [X] 0010_auto_20190709_1559
 [X] 0011_csm_id_bigint
sudo: unable to resolve host ip-10-3-71-92
WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported
  warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning)

2019-08-30 18:55:37,151 WARNING 19583 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider
2019-08-30 18:55:37,152 WARNING 19583 [enterprise.utils] [user None] utils.py:56 - cannot import name EnterpriseCustomerIdentityProvider
courseware
 [X] 0001_initial
 [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration
 [X] 0003_auto_20170825_0935
 [X] 0004_auto_20171010_1639
 [X] 0005_orgdynamicupgradedeadlineconfiguration
 [X] 0006_remove_module_id_index
 [X] 0007_remove_done_index
 [X] 0008_move_idde_to_edx_when
 [X] 0009_auto_20190703_1955
 [X] 0010_auto_20190709_1559
 [X] 0011_csm_id_bigint

Cleanup

# Remove the migration file
root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# rm /edx/app/edxapp/edx-platform/lms/djangoapps/courseware/migrations/0011_csm_id_bigint.py

Squashing Migrations

See Django's Documentation for Squashing Migrations. Some useful tips for squashing:

  • The primary benefit of squashing migrations is the speed-up of running migrations from scratch.  If you are not running migrations from scratch, this may not help you.

     Click for pros/cons of squashing...

    Pros:

    • Processing time for running actual migrations is greatly improved, but we are almost never building from scratch (in edx-platform). Only new instances of Open edX are probably benefiting from this.  It is unclear what IDAs may be running migrations before unit testing.

    • Ultimately, when we remove old files and unnecessary migrations, we may have less maintenance on the old migrations.

    Cons:

    • Processing speeds seem to be unchanged (or worse) for showmigrations, or determining migrations to run in GoCD when there are no new migrations.

    • To get the maximum benefit of faster from-scratch migration times, a lot of careful and potentially error-prone work is required.

    Conclusion:

    • I don’t recommend squashing unless you are starting with a clear problem to be solved, that isn’t already handled through cached databases containing earlier migrations. For example, if you happen to run migrations before unit tests, rather than running based on the models.

    See ARCHBOM-1148 for more details.

  • Managing you squash migrations PR:
    • Keep migration squashing to its own PR.  Introducing a migration in the same PR that you squash can cause issues.
    • Keep the auto-generated squash migration file as its own initial commit on PRs.  This will help your PR reviewers.
  • You can sometimes get an improved squash by removing the data migrations or removing all old migrations to create fresh migrations.
    • Note: commit this separately from the initial auto-generated commit to help with review.
    • You may need to remove all migrations for apps that depend in your migrations as well, to get this to run.
    • If you use this method, ensure makemigrations shows that there is nothing missing from your squash.
  • Testing your squashed migrations.

     Click for more on testing...
    • Try to run all the migrations locally.
    • For pytest, you can use -vvv to show if the migrations are running, and a combination of --create-db and/or --enable-migrations should work.
    • For edx-platform:
      • To test locally, try:

        # Note: unit test don't currently run using migrations, but this will ensure the migrations complete.
        paver test_system -s lms --enable-migrations --verbose --disable_capture
        
        # Or try the following, which you can use to run the bokchoy smoke tests against:
        # Note: the mysqldump command may fail locally with 'Unknown table 'COLUMN_STATISTICS' in information_schema (1109)', but
        #   you should at least have seen all the migrations run successfully first.
        paver update_bokchoy_db_cache
      • Note that almost everywhere, edx-platform has optimizations to skip migrations or run minimal migrations, so squashing doesn't provide much benefit.
  • Important: Squashing Migrations is a two part process, and each part needs to live in a separate Open edX Named Release in order for the community to get caught up before the second part is released.  From Django's docs:
This enables you to squash and not mess up systems currently in production that aren’t fully up-to-date yet. The recommended process is to squash, keeping the old files, commit and release, wait until all systems are upgraded with the new release (or if you’re a third-party project, ensure your users upgrade releases in order without skipping any), and then remove the old files, commit and do a second release.