Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

See Removing a Djangoapp from an existing project.

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.

  1. Make a new field addition in model with null=True.
  2. Generate the model-change migrations locally.
  3. Create a pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. The release process will run the migration and add the nullable column to table.

NOTES:

  • The column must be nullable! (null=True)
    • When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
  • The Django ORM default for a column's null value is False.
    • So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
  • If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
  • If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.

  1. Add the new index on model fields.
  2. Generate the model-change migrations locally.
  3. Make pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. Release will run the migration and add the index to table.

Consider Making a New Model with a OneToOneField

Adding fields to large tables can cause operational issues. What is safe varies from database version (MySQL 5.7 vs. 5.8. vs. 8.0) and specialized backends (like Aurora). Also, even if the database supports adding things in a non-locking way, Django's migrations framework may not understand how to formulate the right SQL to do so.

A lower risk alternative is to create a new model and link it together with a OneToOneField. You can use the primary_key=True option in order to have the new table's primary key match the values of the parent table.

This does complicate the code somewhat, and you should be careful about avoiding n+1 queries by calling select_related. The benefit is not having to hold your breath when the migration rolls out, for fear that you just froze a heavily used table and brought down the site.

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader).  Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.

Data migrations

If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.

Code Block
titleSample code from the Django docs
def combine_names(apps, schema_editor):
    # We can't import the Person model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Person = apps.get_model('yourappname', 'Person')
    for person in Person.objects.all():
        person.name = '%s %s' % (person.first_name, person.last_name)
        person.save()

...

Mathematical perspective: Database Expansion/Contraction

A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField. Defining the relation precisely is left as an exercise to the reader).  Under this model, changing a field (say, from a plain CharField to an EmailField) would consist of an expansion (adding the EmailField) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.

For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.

Data migrations

If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.


Code Block
titleSample code from the Django docs
def combine_names(apps, schema_editor):
    # We can't import the Person model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Person = apps.get_model('yourappname', 'Person')
    for person in Person.objects.all():
        person.name = '%s %s' % (person.first_name, person.last_name)
        person.save()

Deployment and migrations for large or problematic tables

First read "Deployment and backward-compatible migrations" for general information about handling blue-green deployments. Then see this section for special consideration for large or problematic tables.

How to add a nullable column to an existing table (AWS Aurora)

When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.

  1. Make a new field addition in model with null=True.
  2. Generate the model-change migrations locally.
  3. Create a pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. The release process will run the migration and add the nullable column to table.

NOTES:

  • The column must be nullable! (null=True)
    • When adding a column with null=False, these instructions do not apply and you'll need to plan for downtime.
  • The Django ORM default for a column's null value is False.
    • So when adding new model columns which do not specify the null parameter, these instructions do not apply and you'll need to plan for downtime.
  • If you first add a nullable column (null=True) and then change the constraint to non-nullable (null=False) in a later PR, the table will be re-built just as if you added a non-nullable column.
  • If you change an existing nullable column (null=True) to become a non-nullable column (null=False), the table will be re-built just as if you added a non-nullable column.

AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html

How to add index to existing table (AWS Aurora)

On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.

  1. Add the new index on model fields.
  2. Generate the model-change migrations locally.
  3. Make pull request containing the model change and newly-generated migrations.
  4. Merge the migration pull request.
  5. Release will run the migration and add the index to table.

Consider Making a New Model with a OneToOneField

Adding fields to large tables can cause operational issues. What is safe varies from database version (MySQL 5.7 vs. 5.8. vs. 8.0) and specialized backends (like Aurora). Also, even if the database supports adding things in a non-locking way, Django's migrations framework may not understand how to formulate the right SQL to do so.

A lower risk alternative is to create a new model and link it together with a OneToOneField. You can use the primary_key=True option in order to have the new table's primary key match the values of the parent table.

This does complicate the code somewhat, and you should be careful about avoiding n+1 queries by calling select_related. The benefit is not having to hold your breath when the migration rolls out, for fear that you just froze a heavily used table and brought down the site.

Known large and/or problematic tables

Large tables

  • Top ones (ordered by descending size):

    Code Block
    courseware_studentmodulehistory, courseware_studentmodule
    student_historicalcourseenrollment, student_courseenrollment
    student_anonymoususerid, user_api_userorgtag
    django_comment_client_role_users, certificates_generatedcertificate
    auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken


  • See this spreadsheet (edX only): https://docs.google.com/spreadsheets/d/1rrRGsjYYNV41rHYLmDluQw74a8HRegnbveXaMk95gBI/edit#gid=0

Contentious tables

Useful Checklists

Checklist for structural migrations

...

The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

Known large and/or problematic tables

Large tables

Top ones (ordered by descending size):

Code Block
courseware_studentmodulehistory, courseware_studentmodule
student_historicalcourseenrollment, student_courseenrollment
student_anonymoususerid, user_api_userorgtag
django_comment_client_role_users, certificates_generatedcertificate
auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken

...

Contentious tables

Top ones (ordered by descending calls-per-minute):

Code Block
user_api_userpreference
auth_user
student_understanding
theming_sitetheme
django_site
course_modes_coursemode
courseware_studentmodule
course_overviews_courseoverview
waffle_utils_waffleflagcourseoverridemodel
edxval_videoimage
edxval_profile
completion_blockcompletion
student_anonymoususerid

...

Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.

Common migration tasks

Making a migration to create a new table

...