Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

If you have non-trivial migrations to apply, or if two non-local environments (e.g. stage and production) have different migration states, describe your situation in the #django Slack channel and go talk to the Ops SRE team before doing anything else.  Similarly, migrations can become complicated when two different people create two different migrations around the same time.  When in doubt, post in #django and talk to OpsSRE.

It is often useful to review or provide the SQL generated by a migration. See sqlmigrate doc for details.

...

If you still think you need to change old migrations, and you want to verify that there isn't an alternative, see the "When in doubt" section.

Don't change the parent of a migration

Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master.  It will cause real problems and probably downtime for which environment it is deployed to.  When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master.  Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.

Deployment and backward-compatible migrations

Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.

Let's go through a couple examples with our user table, auth_user. It has a few different columns, but we'll use the full_name column for the examples.

Say we decide to change the column's name from full_name (with an underscore) to fullname (no underscore). Our code in production is using full_name. When it's time to deploy this new release, we simply generate a migration and deploy it. Since we are using blue-green deployments, our old code is still looking for the original column name, full_name. However, the new deployment changed the name to fullname, so the original code starts failing.

Instead of renaming the column, say we delete it completely. Again, the database is modified when we deploy, and the original code that is still running will fail.

Because we operate in an environment where new and old code are running simultaneously against the same database, new code must always be compatible with the Also, if you absolutely must change old migrations (this includes things like squashing), after merging and verifying your migration changes, you should update the sql that is used to populate devstack during provisioning. This is to prevent conflicts during provisioning in the future. To update the sql, in the devstack repository, run .update-dbs-init-sql-scripts.sh . This should update edxapp.sql, edxapp_csmh.sql, and ecommerce.sql. Create a PR with the updated sql and merge to devstack as usual.

Don't change the parent of a migration

Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master.  It will cause real problems and probably downtime for which environment it is deployed to.  When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master.  Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.

Deployment and backward-compatible migrations

Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.

Let's go through a couple examples with our user table, auth_user. It has a few different columns, but we'll use the full_name column for the examples.

Say we decide to change the column's name from full_name (with an underscore) to fullname (no underscore). Our code in production is using full_name. When it's time to deploy this new release, we simply generate a migration and deploy it. Since we are using blue-green deployments, our old code is still looking for the original column name, full_name. However, the new deployment changed the name to fullname, so the original code starts failing.

Instead of renaming the column, say we delete it completely. Again, the database is modified when we deploy, and the original code that is still running will fail.

Because we operate in an environment where new and old code are running simultaneously against the same database, new code must always be compatible with the older database schema. Newer deployments can add tables and columns, but neither can be deleted unless the old code is no longer referencing the deleted tables or columns.

...

  1. Release:
    • Add the new field to the model.
      • If the old field has null=False, blank=False, and no default:
        • If the model is used in forms (django admin, or other forums):
          • Create the new field with null=True, editable=False.
          • disabling editable removes the field from
        • else:
          • Create the new field with null=True.
      • else if the old field is a BooleanField:
        • You might need to change the old field type to NullableBooleanField so that unit tests in release 2 will be happy when the old field is removed from code but not sqlite3.
        • Create the new field with BooleanField and the same signature, assuming there's a default set.
      • else if the old field has null=true:
        • Create the new field with the same field signature as the old.
    • Update any place where there are creates or updates on the field
      • Write the same value into both fields
      • If there is a Django admin page or other form and it is used regularly to create/update rows:
        • Register a signal handler to the model to update the new field whenever the old field changes or a new row is created.
  2. Release:
    • Create a data migration to copy the values from the old field into the new field.
      • If the table is large, consider disabling atomicity and batching the copy.
    • Remove all references to the old field in the code.
      • Including removing the old field from the model in the code.
      • If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
      • DO NOT include the migration for removing the old column (yet).
    • If you create the new field with a different field signature than the old, then update it now to be the same as the old.
      • e.g. change null back to False and editable back to True (the default).
        • CAUTION: Changing to null=False will cause a table rebuild during the ALTER. When performing this migration on a table with a large number of rows, degraded performance/downtime will likely result.
        • See 158766629 below.
      • include the migration that goes with this, but NOT the migration to remove the old field.
  3. Release:
    • Run makemigrations, this should pick up the field removal from the previous stage.
    • If this change is in the edx-platform codebase, remove the skip to the test_migrations_are_in_sync unit test.

How to drop a table

Check in with Data Engineering/Data Science & Analytics to understand the downstream impacts of removing a table. When you get an answer about the process around this, update the documentation.

TWO releases:

...


Warning
Due to the current workings of the Open edX ecosystem, some 2U-specific required steps are part of this process to avoid unnecessary problems for 2U, and the community as a whole.

Pre-requirements:

  • A 2U employee must create a Data Platform 1-off request to ensure that there will be no complications due to losing the table (e.g. financial data, or otherwise), and to determine the follow-up required.
    • This must happen before removing the actual tables in 2U Production.

TWO releases (after pre-requirements):

  1. Remove all references to the table by removing references to the model and the model itself
    1. If this change is in the edx-platform codebase, add a skip to the test_migrations_are_in_sync unit test.
  2. Remove the table with a migration
    1. Remove the skip if you added one

...

    1. Important: If 2U has determined that their data can't be lost during pre-requirements above, the merge should be timed with a pipeline pause so the delete migration can be faked.
      1. This should be a rare occurrence, but it has happened.
    2. Remove the skip if you added one

Once a table is removed:

  • A 2U employee should:
    • Follow up with the Data Platform ticket to clean up the table from downstream consumers of data. 
    • If there is an associated app-permissions group then a 2U employee must make an app-permissions PR to remove that group.

How to delete a Django app containing tables

See Removing a Djangoapp from an existing project.

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.

  1. Make a new field addition in model with null=True.
  2. Generate the model-change migrations locally.
  3. Create a pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. The release process will run the migration and add the nullable column to table.

NOTES:

  • The column must be nullable! (null=True)
    • When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
  • The Django ORM default for a column's null value is False.
    • So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
  • If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
  • If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.

  1. Add the new index on model fields.
  2. Generate the model-change migrations locally.
  3. Make pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. Release will run the migration and add the index to table.

Consider Making a New Model with a OneToOneField

Adding fields to large tables can cause operational issues. What is safe varies from database version (MySQL 5.7 vs. 5.8. vs. 8.0) and specialized backends (like Aurora). Also, even if the database supports adding things in a non-locking way, Django's migrations framework may not understand how to formulate the right SQL to do so.

A lower risk alternative is to create a new model and link it together with a OneToOneField. You can use the primary_key=True option in order to have the new table's primary key match the values of the parent table.

This does complicate the code somewhat, and you should be careful about avoiding n+1 queries by calling select_related. The benefit is not having to hold your breath when the migration rolls out, for fear that you just froze a heavily used table and brought down the site.

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader).  Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.

Data migrations

If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.

Code Block
titleSample code from the Django docs
def combine_names(apps, schema_editor):
    # We can't import the Person model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Person = apps.get_model('yourappname', 'Person')
    for person in Person.objects.all():
        person.name = '%s %s' % (person.first_name, person.last_name)
        person.save()

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader).  Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.

Data migrations

If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.


Code Block
titleSample code from the Django docs
def combine_names(apps, schema_editor):
    # We can't import the Person model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Person = apps.get_model('yourappname', 'Person')
    for person in Person.objects.all():
        person.name = '%s %s' % (person.first_name, person.last_name)
        person.save()

Deployment and migrations for large or problematic tables

First read "Deployment and backward-compatible migrations" for general information about handling blue-green deployments. Then see this section for special consideration for large or problematic tables.

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with SRE.

  1. Make a new field addition in model with null=True.
  2. Generate the model-change migrations locally.
  3. Create a pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. The release process will run the migration and add the nullable column to table.

NOTES:

  • The column must be nullable! (null=True)
    • When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
  • The Django ORM default for a column's null value is False.
    • So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
  • If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
  • If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires SRE coordination as the migration may timeout in GoCD.

  1. Add the new index on model fields.
  2. Generate the model-change migrations locally.
  3. Make pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. Release will run the migration and add the index to table.

Consider Making a New Model with a OneToOneField

Adding fields to large tables can cause operational issues. What is safe varies from database version (MySQL 5.7 vs. 5.8. vs. 8.0) and specialized backends (like Aurora). Also, even if the database supports adding things in a non-locking way, Django's migrations framework may not understand how to formulate the right SQL to do so.

A lower risk alternative is to create a new model and link it together with a OneToOneField. You can use the primary_key=True option in order to have the new table's primary key match the values of the parent table.

This does complicate the code somewhat, and you should be careful about avoiding n+1 queries by calling select_related. The benefit is not having to hold your breath when the migration rolls out, for fear that you just froze a heavily used table and brought down the site.

Known large and/or problematic tables

Large tables

  • Top ones (ordered by descending size):

    Code Block
    courseware_studentmodulehistory, courseware_studentmodule
    student_historicalcourseenrollment, student_courseenrollment
    student_anonymoususerid, user_api_userorgtag
    django_comment_client_role_users, certificates_generatedcertificate
    auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken


  • See this spreadsheet (edX only): https://docs.google.com/spreadsheets/d/1rrRGsjYYNV41rHYLmDluQw74a8HRegnbveXaMk95gBI/edit#gid=0

Contentious tables

Useful Checklists

Checklist for structural migrations

...

  •  Is there a primary key? How fast will the table grow? Should the primary key be a bigint (more than 4b rows)? In general, use the Django default generated auto incrementing id for a primary key.
  •  Is there a process for identifying and trimming expired rows? Is there an appropriate index that will prevent a full table scan during cleanup?
  •  What are the most common queries? Are there indexes to support them?
  •  Should there be unique constraints?

Checklist for data migrations

  •  Is there a rollback migration?  Does it correctly rollback to the previous state?
  • .
  •  Is there a migration test? Django migration testing
  •  How long does a rollback take to run?
  •  Is data being loaded into this table? Is it static or dynamic? How long will it take to load?
  •  Have we double-checked that models are not being imported directly (are we allowing historical models to be used)?

Checklist for adding indexes

  •  How long will it take? How big is the table?
  •  What is the read/write ratio on the table? What is the impact to writes? 
  •  Should we alter an existing index rather than add a new one (left most columns, etc)

Testing migrations

Unit testing

Migrations are currently not run in unit tests.

Acceptance tests

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

Known large and/or problematic tables

Large tables

Top ones (ordered by descending size):

Code Block
courseware_studentmodulehistory, courseware_studentmodule
student_historicalcourseenrollment, student_courseenrollment
student_anonymoususerid, user_api_userorgtag
django_comment_client_role_users, certificates_generatedcertificate
auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken

...

Contentious tables

Top ones (ordered by descending calls-per-minute):

Code Block
user_api_userpreference
auth_user
student_understanding
theming_sitetheme
django_site
course_modes_coursemode
courseware_studentmodule
course_overviews_courseoverview
waffle_utils_waffleflagcourseoverridemodel
edxval_videoimage
edxval_profile
completion_blockcompletion
student_anonymoususerid

...

  • process for identifying and trimming expired rows? Is there an appropriate index that will prevent a full table scan during cleanup?
  •  What are the most common queries? Are there indexes to support them?
  •  Should there be unique constraints?

Checklist for data migrations

  •  Is there a rollback migration?  Does it correctly rollback to the previous state?
  •  Is there a migration test? Django migration testing
  •  How long does a rollback take to run?
  •  Is data being loaded into this table? Is it static or dynamic? How long will it take to load?
  •  Have we double-checked that models are not being imported directly (are we allowing historical models to be used)?

Checklist for adding indexes

  •  How long will it take? How big is the table?
  •  What is the read/write ratio on the table? What is the impact to writes? 
  •  Should we alter an existing index rather than add a new one (left most columns, etc)

Testing migrations

Unit testing

Migrations are currently not run in unit tests.

Acceptance tests

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

Common migration tasks

Making a migration to create a new table

...