Everything About Database Migrations

This document contains information on what you need to know about Django Migrations in the Open edX platform.  If you are unfamiliar with database migrations, or, specifically, Django migrations, please read the reference documentation.  Django does a good job of abstracting away what's behind the scenes when you run "./manage.py makemigrations" and "./manage.py migrate"- understanding what happens during these operations is extremely useful.

Reference documentation

When in doubt

If you have non-trivial migrations to apply, or if two non-local environments (e.g. stage and production) have different migration states, describe your situation in the #django Slack channel and go talk to the Ops team before doing anything else.  Similarly, migrations can become complicated when two different people create two different migrations around the same time.  When in doubt, post in #django and talk to Ops.

Don't revert code that includes migrations, don't change old migrations.

Django migrations should be considered "applied" as soon as they land on master of a repo. Missing ( ghost ) migrations cause problems for Django, and require manual intervention to fix.  Fix forward on migrations. Be sure to properly consider all the points below so that you're less likely to want to delete or change a migration.  If you do delete a migration, or revert a commit that contains a migration, follow this guide to communicating with the organization and the community: How to revert a migration from master.  Also, you should never roll back migrations (manually) without rolling back the code that relies on them.

If you still think you need to change old migrations, and you want to verify that there isn't an alternative, see the "When in doubt" section.

Don't change the parent of a migration

Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master.  It will cause real problems and probably downtime for which environment it is deployed to.  When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master.  Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.

Deployment and backward-compatible migrations

Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.

Let's go through a couple examples with our user table, auth_user. It has a few different columns, but we'll use the full_name column for the examples.

Say we decide to change the column's name from full_name (with an underscore) to fullname (no underscore). Our code in production is using full_name. When it's time to deploy this new release, we simply generate a migration and deploy it. Since we are using blue-green deployments, our old code is still looking for the original column name, full_name. However, the new deployment changed the name to fullname, so the original code starts failing.

Instead of renaming the column, say we delete it completely. Again, the database is modified when we deploy, and the original code that is still running will fail.

Because we operate in an environment where new and old code are running simultaneously against the same database, new code must always be compatible with the older database schema. Newer deployments can add tables and columns, but neither can be deleted unless the old code is no longer referencing the deleted tables or columns.

How to drop a column

TWO releases:

  1. Remove all usages of the column, including updating the model to not refer to the field/column anymore (i.e. Model field must be removed in this step)
  2. Drop the column (with a migration).

Returning to our example with the auth_user table. If we still want to drop the full_name column, we should do the following:

  1. Remove every usage of the full_name column in our codebase. Release that change to production, and ensure older code is no longer running. (We once had a stale ASG in production a few hours after a release, and it caused a few issues when we dropped a column.)
  2. Create a database migration to drop the column. Release it.
  3. (This step intentionally left bank...because nothing broke in production!)

How to rename a column

Renaming a column while keeping the business logic fully functional and without taking any down time is a very delicate and complex process.  Some things to keep in mind before you start:

  1. Do not allow downtime or alter business logic between releases.
  2. Do not allow downtime or alter business logic during a release, i.e. after migrations and before code deployment.
  3. Do not allow any data to be permanently dropped, even if only a subset of the data.
  4. Every release must have a functional rollback plan.
  5. As best as possible, avoid releases that must be immediately followed up by another release.  It should be safe to walk away from the rollout halfway through (e.g. code freezes, vacations, etc. might stop work).

THREE releases:

  1. Release:
    • Add the new field to the model.
      • If the old field has null=False, blank=False, and no default:
        • If the model is used in forms (django admin, or other forums):
          • Create the new field with null=True, editable=False.
          • disabling editable removes the field from
        • else:
          • Create the new field with null=True.
      • else if the old field is a BooleanField:
        • You might need to change the old field type to NullableBooleanField so that unit tests in release 2 will be happy when the old field is removed from code but not sqlite3.
        • Create the new field with BooleanField and the same signature, assuming there's a default set.
      • else if the old field has null=true:
        • Create the new field with the same field signature as the old.
    • Update any place where there are creates or updates on the field
      • Write the same value into both fields
      • If there is a Django admin page or other form and it is used regularly to create/update rows:
        • Register a signal handler to the model to update the new field whenever the old field changes or a new row is created.
  2. Release:
    • Create a data migration to copy the values from the old field into the new field.
      • If the table is large, consider disabling atomicity and batching the copy.
    • Remove all references to the old field in the code.
      • Including removing the old field from the model in the code.
      • If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
      • DO NOT include the migration for removing the old column (yet).
    • If you create the new field with a different field signature than the old, then update it now to be the same as the old.
      • e.g. change null back to False and editable back to True (the default).
      • include the migration that goes with this, but NOT the migration to remove the old field.
  3. Release:
    • Run makemigrations, this should pick up the field removal from the previous stage.
    • If this change is in the edx-platform codebase, remove the skip to the test_migrations_are_in_sync unit test.

How to drop a table

TWO releases:

  1. Remove all references to the table.
  2. Remove the table's model (with a migration).

How to delete a Django app containing tables

See Removing a Djangoapp from an existing project

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.

  1. Make a new field addition in model with null=True.
  2. Generate the model-change migrations locally.
  3. Create a pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. The release process will run the migration and add the nullable column to table.

NOTES:

  • The column must be nullable! (null=True)
    • When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
  • The Django ORM default for a column's null value is False.
    • So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
  • If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
  • If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.

  1. Add the new index on model fields.
  2. Generate the model-change migrations locally.
  3. Make pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. Release will run the migration and add the index to table.

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader).  Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.

Data migrations

If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.

Useful Checklists

Checklist for structural migrations

Existing Tables

  • Will the migration cause data loss?
  • Will this have a performance impact? NB: We do not take maintenance windows for migrations. We vastly prefer to re-engineer the migration than to schedule a maintenance window and virtually all migrations should be able to be engineered to avoid downtime by being additive-only
    • Is the migration against a large table (see section below) ?
    • Is the migration against a busy/highly contentious table (many writes/deletes/etc - see section below)?
      • For the community, if we anticipate a potentially significant migration, make a note on the Open edX Release page for the next release
    • How long do you expect the migration to take to run? Options include naive local testing, using Everything About Database Migrations#Loadtesting or some other synthetic method that might give you a reasonable framework to guess / extrapolate from
    • Do you expect the migration to block queries, particularly frequent / user facing ones?
  • Is the migration backward compatible? Does it remove or edit a schema that the previous version of the code expects to be there?

New Tables

  • Is there a primary key? How fast will the table grow? Should the primary key be a bigint (more than 4b rows)?
  • Is there a process for identifying and trimming expired rows? Is there an appropriate index that will prevent a full table scan during cleanup?
  • What are the most common queries? Are there indexes to support them?
  • Should there be unique constraints?

Checklist for data migrations

  • Is there a rollback migration?  Does it correctly rollback to the previous state?
  • Is there a migration test? Django migration testing
  • How long does a rollback take to run?
  • Is data being loaded into this table? Is it static or dynamic? How long will it take to load?
  • Have we double-checked that models are not being imported directly (are we allowing historical models to be used)?

Checklist for adding indexes

  • How long will it take? How big is the table?
  • What is the read/write ratio on the table? What is the impact to writes? 
  • Should we alter an existing index rather than add a new one (left most columns, etc)

Testing migrations

Unit testing

Migrations are currently not run in unit tests.

Acceptance tests

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

Load tests

There are caveats about using our existing load test environment as part of your migration testing. The existing load tests have limitations about what kinds of queries they generate and the data that is used in the load test environment is not representative. 

TODO: Write up best practices for testing migrations in the load test environment. Requires coordination with devops - work with them to run your migration while you're generating load against it. Monitor the error rate, throughput, etc.

Known large and/or problematic tables

Large tables

Contentious tables

Common migration tasks

Making a migration to create a new table

  1. Create a new directory under djangoapps and create a models.py file within it describing your model fields (example: common/djangoapps/track/models.py).
  2. Add the name of your module to INSTALLED_APPS in the appropriate environment file. Do NOT run manage.py syncdb (recommended in the Django documentation).

Once you are happy with how your fields are defined in models.py, run the following command. The resulting file will be checked in with your PR.


Making a migration to modify an existing table 

When you make changes to your model, create migration file and check it in:

Make sure you are pointing to the correct environment file.

Performing a migration

After creating your migration file, if you are running Open edX via the DevStack configuration, you can perform the migration using the following command:

Rolling back a migration

where <number> is the prefix of the migration file that you want to roll back to.

Unapply all migrations

There's a special "zero" migration name to unapply all migrations, including the initial migration.

Rare migration tasks

Faking migrations

Example for CSM primary key to bigint migration.

Do the following before merging/deploying the code, otherwise the pipeline will try to run the migrations

Copy the migration file onto a machine, can be a worker if the app has worker machines

Run Migration Fake

Cleanup

Squashing Migrations

See Django's Documentation for Squashing Migrations. Some useful tips for squashing:

  • The primary benefit of squashing migrations is the speed-up of running migrations from scratch.  If you are not running migrations from scratch, this may not help you.

  • Managing you squash migrations PR:
    • Keep migration squashing to its own PR.  Introducing a migration in the same PR that you squash can cause issues.
    • Keep the auto-generated squash migration file as its own initial commit on PRs.  This will help your PR reviewers.
  • You can sometimes get an improved squash by removing the data migrations or removing all old migrations to create fresh migrations.
    • Note: commit this separately from the initial auto-generated commit to help with review.
    • You may need to remove all migrations for apps that depend in your migrations as well, to get this to run.
    • If you use this method, ensure makemigrations shows that there is nothing missing from your squash.
  • Testing your squashed migrations.

  • Important: Squashing Migrations is a two part process, and each part needs to live in a separate Open edX Named Release in order for the community to get caught up before the second part is released.  From Django's docs:
This enables you to squash and not mess up systems currently in production that aren’t fully up-to-date yet. The recommended process is to squash, keeping the old files, commit and release, wait until all systems are upgraded with the new release (or if you’re a third-party project, ensure your users upgrade releases in order without skipping any), and then remove the old files, commit and do a second release.