edx-platform Django 1.11 Upgrade Plan


Overview

Edx-platform currently uses Django v1.8, which reaches EOL in April 2018. While the Gingko edx-platform release will use Django v1.8, we plan to move edx-platform to Django v1.11 (D1.11) before the Hawthorn OpenEdx release occurring in December 2017. This document details the plan we’ll follow to upgrade the platform to use Django v1.11.

To correlate the timeline of support windows with Open edX releases, see this spreadsheet.

The plan leverages lessons we learned in doing the Django 1.4->1.8 upgrade.

Benefits

The main benefit of this upgrade is to keep edx-platform on a supported Django version that receives any needed security fixes. But there are other added benefits of moving from Django 1.8 to 1.11, including:

  • faster migrations
  • a new on_commit() decorator, which allows actions to be performed after a DB transaction is successfully committed only
  • password validation
  • support for running tests in parallel

Plan

General Strategy

Merge-As-We-Go

For the Django 1.4->1.8 upgrade, we maintained a single long-running branch in the edx-platform repository that was rebased every week onto the current master branch. For the D1.11 upgrade, we will instead merge-as-we-go, merging backward-compatible changes into edx-platform whenever possible. In some cases, this strategy might require wrapping some code in a compatibility layer - and writing a ticket to remove the layer later. It’s unclear whether this strategy will be possible for all changes - we’ll see.

Consistent Branch Naming

Branches that are created in dependent repos in order to perform D1.11 compatibility work will all have a consistent name, likely: django1.11_upgrade

Branches that are created in edx-platform for incremental backward-compatibility work will have a name that ideally incorporates the ticket name for which the work is being performed, such as: PLAT_1012_add_app_labels

Leap From Django 1.8 to Django 1.11

We plan to upgrade the platform from Django 1.8 directly to Django 1.11. The main reason why is to avoid all the overhead of upgrading each dependency three times - one each from the 1.8->1.9, 1.9->1.10, and 1.10->1.11 upgrade steps. In doing this leap, we’ll lose the usefulness of the deprecation warnings. But a close read of the release notes of each version should provide the same information upon which we can act.

No Remaining Deprecation Warnings

At the end of the Django 1.8 upgrade, many RemovedInDjango19 warnings still existed and were not fixed. Django 1.11 is the last version in the 1.x series - the next version will be 2.0. So while it’s doubtful that any deprecation warnings will warn about things removed in 2.0, if any such warnings do exist, they will be fixed before the upgrade is considered complete.

When Are We Done? Maintain A Decreasing Number

In many projects, it’s difficult to know how close to done a project is. (“Almost” is a common answer.) This project will design and maintain a single number (possibly multiple numbers) that will go to zero when complete - with a transparent calculation.

Sequenced Work

1) Read, study, and ticket the Django 1.9, 1.10, and 1.11 release notes.

The new Django versions will deprecate some functionality and add other functionality. We’ll need to search the codebase and dependencies for any usage of deprecated and removed functionality. And we’ll also want to make explicit decisions about whether to use new functionality *and* how to deal with any breaking changes.

For example, the Django 1.4->1.8 upgrade had to deal with a change in the way database transactions were handled, which led the team to build a new decorator to deal with existing transactions, @outer_atomic.

There’s an existing epic with the Django 1.9 release notes broken down into tickets:

https://openedx.atlassian.net/browse/EV-94

This phase will break down the relevant Django 1.10 and 1.11 release notes into tickets as well. It’s important to look at the release notes first before looking at the external dependencies, as any changes we make for D1.11 compatibility might want to leverage new Django functionality introduced in one of the new versions.

2) Fix Existing Django Deprecation Warnings

Edx-platform has generated lots of Django deprecation warnings since the previous Django 1.8 upgrade. Those logs can be seen with this Splunk query:

https://splunk.edx.org/en-US/app/search/search?q=search%20index%3D%22prod-edx%22%20RemovedInDjango19Warning&display.page.search.mode=smart&dispatch.sample_ratio=1&earliest=&latest=&sid=1496760947.1377356

All these warnings are relevant to the D1.11 upgrade and will need to be dealt with. So we’ll make tickets for each of these warnings that aren’t covered by any other tickets.

3) External Dependencies

Edx-platform has many external dependencies. Some number of those dependencies are Django-dependent and support a particular version of Django.

  • Enumerate all edx-owned satellite repository dependencies.

  • Determine the Django version supported by each edx-owned dependency.

    • Capture the full-range of Django support (if possible).

    • Use a tool (off-the-shelf or authored) if at all possible.

    • Capture the state of Python3 support (if possible).

  • Enumerate all non-edx-owned dependencies.

  • Determine the Django version supported by each non-edx-owned dependency.

    • Capture the full-range of Django support (if possible).

    • Use a tool (off-the-shelf or authored) if at all possible.

    • Capture the state of Python 3 support (if possible).

  • For all the dependencies that don’t support D1.11, create JIRA tickets.

Edx-Owned Dependencies

These repos will fall generally in a few categories:

  • Forks of other externally-controlled repos

  • Edx-controlled repos in the edx GitHub org

  • Non-forked, externally-controlled repos

Forks

For forked repos that don’t support D1.11:

  • Determine if D1.11 support has been added in the upstream repo.

  • Check if the upstream repo is still under development *or* has been abandoned.

  • Check for Python3 support as well (optional).

If D1.11 support has been added in the upstream repo, then a decision must be made to do one of the following options:

  • Integrate the upstream support into the edx repo fork.

  • Add D1.11 support to the existing fork without integrating upstream changes.

  • Move off the edx repo fork and back onto the upstream repo.

The JIRA ticket will be assigned to a person or persons whose functional ownership is appropriate to make the decision. In the case of performing work in the forked repo itself, the work will be done on a branch with the consistent name. If the work can be performed in a backward-compatible manner:

  • Create a branch and a PR in the dependent repo.

  • Pass repo CI testing, possibly writing any appropriate new tests.

  • Create an edx-platform branch/PR which requires the repo branch.

  • Pass edx-platform CI.

  • Get review approvals for the dependent repo PR.

  • Merge the dependent repo PR.

  • Change the edx-platform PR to use the committed dependency version.

  • Pass edx-platform CI.

  • Get review approvals for the edx-platform PR.

  • Merge the edx-platform PR.

  • If the edx-platform PR has migrations, ensure they are safe to release!

Edx-Controlled Repos

For edx-controlled repos that don’t support D1.11, we’ll follow the steps described above to add D1.11 compatibility that is also backwards-compatible. And we'll make every attempt to ensure compatibility across any supported versions of Django and Python using tox - for an example, see: https://github.com/edx/edx-drf-extensions/blob/master/tox.ini

Non-Forked, Externally Controlled Repos

Edx-platform depends on some repos that are not in the edx GitHub organization - but whose sole purpose is to provide edx functionality. For many examples of this type of repo, check this file:

https://github.com/edx/edx-platform/blob/master/requirements/edx/github.txt#L54

For any of these repos that are Django-dependent and require changes to be D1.11-compatible, we’ll need to either fork the repos or submit upstream changes.

Also, some externally-controlled repos will be abandoned entirely. If so, we'll need to choose one of these options:

  • fork the repo into the edx organization and implement D1.11 support
  • shift to another dependency that provides the same functionality - and abandon the abandoned repo as well

Minimum Django Versions

For all edx-owned repositories, we will no longer support Django 1.8.x. During the upgrade, any fork or edx-owned repository will have their minimum version updated to Django 1.11.x.

4) Pass all edx-platform CI in Django 1.11

After dealing with all external dependencies and getting all CI to run using Django 1.8, it’s time to get CI to run for edx-platform under both Django 1.8 and Django 1.11. By switching over to Django 1.11 on a branch, all the edx-platform failing tests will become apparent. Those tests will be broken down into common failures by parsing the nose output with a script to group and display the failures. Those failures will be added as tickets and fixed in a backwards-compatible way, supporting both Django 1.8 and 1.11.

Why Not Test Both? (Protecting Against Regression)

This year’s PyCon featured an excellent Instagram talk about their upgrade from Python 2 to Python 3. In their process, they ran their CI against both Python 2 *and* Python 3. To get the test suite to pass in both virtualenvs, they first whitelisted certain tests (“run only these tests”) that were known to pass in Python 3 and worked to get the failures to pass, adding those tests to the Python 3 whitelist upon success. Towards the end, they moved to a blacklist (“run all tests but these”) - until all tests passed in both virtualenvs.

The Django upgrade to 1.11 would attempt to use the strategy above to prevent regressions as normal development occurs simultaneously with the upgrade. We’d run the relevant CI suite against both Django 1.8 and Django 1.11 while at first only whitelisting/running the tests that are known to pass. We’d then work on getting all the failed tests to pass, adding each to the whitelist upon success.

Some discovery work is required on how to make this work:

  • Is it feasible for Jenkins to support this type of multi-version testing?

  • Can we make nose support test whitelisting/blacklisting?

  • What are the challenges in adding more testing suites to the existing ones?

    • Computing resources?

    • Jenkins/GitHub configuration?

It’s worth noting that this type of parallel, multi-environment testing would likely be applicable to the eventual Python 3 effort as well, making any required configuration work less costly over the long run.

5) Pre-Release Testing - Performance, Automated, and Manual

Performance

Checking for any performance regressions seems like a good idea before switching the entire platform over to D1.11. A regular Django 1.8 edxapp AMI and a Django 1.11 edxapp AMI will be built and deployed to the loadtest environment. The existing suite of locust-based load tests will be run against each AMI and the response times will be compared against each other.


Automated

As detailed above, the Django 1.11 CI tests will continue to be run in parallel during this period of final testing, in order to prevent regressions regarding any D1.11-incompatible code.

Manual

The project would gain control over either loadtest or, if loadtest is insufficient, the stage environment in order to enable manual testing by teams. Teams would be notified and encouraged to manually test the features in their ownership areas. Also, we could again do an engineering (or company-wide) bug bash, in which we gather at particular times/locations and test edxapp running the new Django version.

Also, if there’s any areas of specific concern, manual testing of those areas could be commissioned and performed.

6) Rollout/Rollback Plan

See Django 1.11 rollout/rollback plan.

It’s difficult to know what will be included in a rollout plan of edxapp with Django 1.11. We’ll need to perform a majority of the work to make a rollout plan in which we can have confidence. An ideal rollout plan would be to switch the Django version to 1.11 and release - but it’s unlikely to be the case. For example, we already know of a Django database migration that modifies the auth_user table which will need special handling.

We’ll write the rollout plan continually as we know further details about the special cases the final release will need to handle.

An important part of the rollout plan will be an accompanying rollback plan, which covers the actions needed to rollback at any stage during the rollout process. We'll write a detailed rollback plan as well.

7) Rollout

Ship it. a.k.a. Perform the rollout according to the written plan.

Backup Plan

It's unclear whether we'll be able to make a jump from Django 1.8 to 1.11 cleanly. Also, it's less risky to release smaller increments of software. So there's another plan to which we'll consider shifting after some initial discovery work is performed. This plan is a two-phase upgrade plan:

  • First, upgrade edx-platform and all dependencies from Django 1.8 to 1.10.
    • Dependencies to Django 1.11 as much as possible.
  • Release edx-platform with Django 1.10 (merged to master and released to production).
  • Then, upgrade edx-platform and all dependencies from Django 1.10 to 1.11.
  • Release edx-platform with Django 1.11.

This plan will incur more overhead in more pre-release testing, two rollout/rollback plans, and two rollouts - with the benefit of less risk at each release point.

Known Issues

edx-oauth2-provider/django-oauth2-provider

Edx-platform currently has two different OAuth libraries - neither of which have any further external support.

Notes from ClintonB (Deactivated):

https://openedx.atlassian.net/browse/LEARNER-724 covers the work needed to get us off of django-oauth2-provider (DOP) and onto django-oauth-toolkit (DOT).

https://github.com/edx/edx-platform/pull/15054 is meant to update DOT to use a custom Application model to support the client credentials grant for all Application instances. However, the transition to the new model is easier said than done.

Doing things purely in Django may not work. We may need to backup the data, migrate to zero for DOT, use our own models, and restore the data. Ugh!

auth_user migration in Django 1.10

In migrating other IDAs to Django 1.10, a migration was discovered that changed a field in the auth_user table. As the existing edxapp auth_user table contains over 10 million rows, we'll need to make a special accommodation for this migration in the rollout plan - likely faking the migration so that it doesn't occur.

Increased memory usage and open database connections/file handles in Django 1.10/11

The LEARNER team has upgraded the discovery IDA to Django 1.11 and has experienced problems with increased memory usage and open DB handles:

RCA: LEARNER-1942 - Connections to DB never getting closed and recycled

LEARNER-1895 - Getting issue details... STATUS

The LEARNER team recommends that we execute performance testing before releasing an upgraded edx-platform to production because of these discovered issues. They will do so for the discovery IDA in this ticket:  LEARNER-2021 - Getting issue details... STATUS