This document contains information on what you need to know about Django Migrations in the Open edX platform. If you are unfamiliar with database migrations, or, specifically, Django migrations, please read the reference documentation. Django does a good job of abstracting away what's behind the scenes when you run "./manage.py makemigrations" and "./manage.py migrate"- understanding what happens during these operations is extremely useful.
If you have non-trivial migrations to apply, or if two non-local environments (e.g. stage and production) have different migration states, describe your situation in the #django Slack channel and go talk to the Ops team before doing anything else. Similarly, migrations can become complicated when two different people create two different migrations around the same time. When in doubt, post in #django and talk to Ops.
Django migrations should be considered "applied" as soon as they land on master of a repo. Missing ( ghost ) migrations cause problems for Django, and require manual intervention to fix. Fix forward on migrations. Be sure to properly consider all the points below so that you're less likely to want to delete or change a migration. If you do delete a migration, or revert a commit that contains a migration, follow this guide to communicating with the organization and the community: How to revert a migration from master. Also, you should never roll back migrations (manually) without rolling back the code that relies on them.
If you still think you need to change old migrations, and you want to verify that there isn't an alternative, see the "When in doubt" section.
Along the same lines of a migration being considered "applied" once merged into master, you should never change the dependencies of a migration once it has landed on master. It will cause real problems and probably downtime for which environment it is deployed to. When you create new migrations in a feature branch, you want those to be the most recent migrations when you merge into master. Using an analogy to git, you always want your new migrations to be at the "HEAD" of the migration history in your app.
Here at edX, we use the blue-green deployment method. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is not backward-compatible with the previous release.
Let's go through a couple examples with our user table, auth_user. It has a few different columns, but we'll use the full_name column for the examples.
Say we decide to change the column's name from full_name (with an underscore) to fullname (no underscore). Our code in production is using full_name. When it's time to deploy this new release, we simply generate a migration and deploy it. Since we are using blue-green deployments, our old code is still looking for the original column name, full_name. However, the new deployment changed the name to fullname, so the original code starts failing.
Instead of renaming the column, say we delete it completely. Again, the database is modified when we deploy, and the original code that is still running will fail.
Because we operate in an environment where new and old code are running simultaneously against the same database, new code must always be compatible with the older database schema. Newer deployments can add tables and columns, but neither can be deleted unless the old code is no longer referencing the deleted tables or columns.
For either a nullable or non-nullable column, first make sure there are no other models or code that actually use the column. If there are, make adjustments to those first before working through the deletion flow.
For NULLABLE columns, this is involves TWO releases:
For NOT-NULLABLE columns, this involves THREE releases:
Returning to our example with the auth_user table. If we still want to drop the full_name column, we should do the following:
Renaming a column while keeping the business logic fully functional and without taking any down time is a very delicate and complex process. Some things to keep in mind before you start:
THREE releases:
test_migrations_are_in_sync
unit test.makemigrations
, this should pick up the field removal from the previous stage.test_migrations_are_in_sync
unit test.TWO releases:
See Removing a Djangoapp from an existing project
When using AWS Aurora, a nullable column can be added to existing large (100k rows+?) tables without causing downtime. However, the migration may still timeout in GoCD - so please coordinate the release with DevOps.
NOTES:
AWS Aurora Docs: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.FastDDL.html
On AWS Aurora, indexes can be build on large tables without causing downtime, but this requires devops coordination as the migration may timeout in GoCD.
A good way to think of this is that migrations can "expand" and "contract" the database. Adding fields is an expansion, and removing them is a contraction. If you're feeling a bit more mathematical today, there's a partial ordering relation on (db, code) where your database and code are in the relation iff the set of fields in the DB is a (non-strict) superset of the fields in the code (... well, this isn't quite right, since changing fields is OK in circumstances like extending the length of a CharField
. Defining the relation precisely is left as an exercise to the reader). Under this model, changing a field (say, from a plain CharField
to an EmailField
) would consist of an expansion (adding the EmailField
) followed later (potentially much later, but mainly not in the same release) by a contraction (deleting the CharField
). Code can also expand and contract in similar ways, by changing which fields are declared in your Django models.
For a migration to be backwards compatible, the database must always be at least as "large" as the code. It can be larger (contain a field not referenced by the code), but not smaller.
If you're writing a data migration, don't import the model directly. Instead, allow Django to use the historical version of your model. This will allow your migration step to use the old (historical) version of your model, even if the model will later by changed by a subsequent database migration.
def combine_names(apps, schema_editor): # We can't import the Person model directly as it may be a newer # version than this migration expects. We use the historical version. Person = apps.get_model('yourappname', 'Person') for person in Person.objects.all(): person.name = '%s %s' % (person.first_name, person.last_name) person.save() |
Migrations are currently not run in unit tests.
The paver commands that kick off the Lettuce and bokchoy tests run migrations. However, because this would take a long time if we started from scratch, we cache the latest state of the database after certain intervals (every couple months when someone checks in a new cache) so all the migrations are not run, but only the ones added since the last time the database state was cached.
There are caveats about using our existing load test environment as part of your migration testing. The existing load tests have limitations about what kinds of queries they generate and the data that is used in the load test environment is not representative.
TODO: Write up best practices for testing migrations in the load test environment. Requires coordination with devops - work with them to run your migration while you're generating load against it. Monitor the error rate, throughput, etc.
Top ones (ordered by descending size):
courseware_studentmodulehistory, courseware_studentmodule student_historicalcourseenrollment, student_courseenrollment student_anonymoususerid, user_api_userorgtag django_comment_client_role_users, certificates_generatedcertificate auth_userprofile, user_api_userpreference, auth_user, oauth2_accesstoken |
Top ones (ordered by descending calls-per-minute):
user_api_userpreference auth_user student_understanding theming_sitetheme django_site course_modes_coursemode courseware_studentmodule course_overviews_courseoverview waffle_utils_waffleflagcourseoverridemodel edxval_videoimage edxval_profile completion_blockcompletion student_anonymoususerid |
Once you are happy with how your fields are defined in models.py, run the following command. The resulting file will be checked in with your PR.
./manage.py [lms|cms] --settings=devstack_docker makemigrations --initial name_of_app |
Making a migration to modify an existing table
When you make changes to your model, create migration file and check it in:
./manage.py [lms|cms] --settings=devstack_docker makemigrations name_of_app --pythonpath=. |
Make sure you are pointing to the correct environment file.
After creating your migration file, if you are running Open edX via the DevStack configuration, you can perform the migration using the following command:
./manage.py [lms|cms] --settings=devstack_docker migrate name_of_app |
./manage.py [lms|cms] --settings=devstack_docker migrate name_of_app <number> |
where <number>
is the prefix of the migration file that you want to roll back to.
There's a special "zero" migration name to unapply all migrations, including the initial migration.
./manage.py [lms|cms] --settings=devstack_docker migrate name_of_app zero |
Example for CSM primary key to bigint migration.
Do the following before merging/deploying the code, otherwise the pipeline will try to run the migrations
# Copy file contents into /edx/app/edxapp/edx-platform/lms/djangoapps/courseware/migrations/0011_csm_id_bigint.py on worker machine # Check that the migration shows up in the list as unapplied root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# /edx/bin/edxapp-migrate-cms --noinput --list courseware sudo: unable to resolve host ip-10-3-71-92 WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning) 2019-08-30 18:08:47,084 WARNING 11141 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider 2019-08-30 18:08:47,084 WARNING 11141 [enterprise.utils] [user None] utils.py:56 - cannot import name _LTI_BACKENDS courseware [X] 0001_initial [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration [X] 0003_auto_20170825_0935 [X] 0004_auto_20171010_1639 [X] 0005_orgdynamicupgradedeadlineconfiguration [X] 0006_remove_module_id_index [X] 0007_remove_done_index [X] 0008_move_idde_to_edx_when [X] 0009_auto_20190703_1955 [X] 0010_auto_20190709_1559 [ ] 0011_csm_id_bigint sudo: unable to resolve host ip-10-3-71-92 WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning) 2019-08-30 18:08:52,683 WARNING 11392 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider 2019-08-30 18:08:52,684 WARNING 11392 [enterprise.utils] [user None] utils.py:56 - cannot import name _LTI_BACKENDS courseware [X] 0001_initial [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration [X] 0003_auto_20170825_0935 [X] 0004_auto_20171010_1639 [X] 0005_orgdynamicupgradedeadlineconfiguration [X] 0006_remove_module_id_index [X] 0007_remove_done_index [X] 0008_move_idde_to_edx_when [X] 0009_auto_20190703_1955 [X] 0010_auto_20190709_1559 [ ] 0011_csm_id_bigint root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# |
# Add extra space in front to prevent bash from writing password to history root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# DB_MIGRATION_USER=migrate001 DB_MIGRATION_PASS=redacted /edx/bin/edxapp-migrate-cms --fake courseware 0011_csm_id_bigint # Confirm the migration was applied root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# /edx/bin/edxapp-migrate-cms --list courseware sudo: unable to resolve host ip-10-3-71-92 WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning) 2019-08-30 18:55:31,993 WARNING 19468 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider 2019-08-30 18:55:31,993 WARNING 19468 [enterprise.utils] [user None] utils.py:56 - cannot import name EnterpriseCustomerIdentityProvider courseware [X] 0001_initial [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration [X] 0003_auto_20170825_0935 [X] 0004_auto_20171010_1639 [X] 0005_orgdynamicupgradedeadlineconfiguration [X] 0006_remove_module_id_index [X] 0007_remove_done_index [X] 0008_move_idde_to_edx_when [X] 0009_auto_20190703_1955 [X] 0010_auto_20190709_1559 [X] 0011_csm_id_bigint sudo: unable to resolve host ip-10-3-71-92 WARNING:py.warnings:/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/__init__.py:7: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning) 2019-08-30 18:55:37,151 WARNING 19583 [enterprise.utils] [user None] utils.py:55 - Could not import Registry from third_party_auth.provider 2019-08-30 18:55:37,152 WARNING 19583 [enterprise.utils] [user None] utils.py:56 - cannot import name EnterpriseCustomerIdentityProvider courseware [X] 0001_initial [X] 0002_coursedynamicupgradedeadlineconfiguration_dynamicupgradedeadlineconfiguration [X] 0003_auto_20170825_0935 [X] 0004_auto_20171010_1639 [X] 0005_orgdynamicupgradedeadlineconfiguration [X] 0006_remove_module_id_index [X] 0007_remove_done_index [X] 0008_move_idde_to_edx_when [X] 0009_auto_20190703_1955 [X] 0010_auto_20190709_1559 [X] 0011_csm_id_bigint |
# Remove the migration file root@ip-10-3-71-92:/edx/app/edxapp/edx-platform# rm /edx/app/edxapp/edx-platform/lms/djangoapps/courseware/migrations/0011_csm_id_bigint.py |
See Django's Documentation for Squashing Migrations. Some useful tips for squashing:
The primary benefit of squashing migrations is the speed-up of running migrations from scratch. If you are not running migrations from scratch, this may not help you.
Pros:
Cons:
Conclusion:
See ARCHBOM-1148 for more details. |
Testing your squashed migrations.
|
This enables you to squash and not mess up systems currently in production that aren’t fully up-to-date yet. The recommended process is to squash, keeping the old files, commit and release, wait until all systems are upgraded with the new release (or if you’re a third-party project, ensure your users upgrade releases in order without skipping any), and then remove the old files, commit and do a second release.