Upgrade Project Runbook
Overview
Open edX is a large software project with many dependencies that periodically need to be upgraded. This runbook outlines the steps that should be taken each time we embark on a large software upgrade or maintenance project, to minimize the total amount of effort required to get it done and maximize the likelihood of completing the project on schedule.
Identify the Need Early
The first step is to realize an upgrade is even needed. We track the major Open edX dependencies which trigger recurring upgrade projects in this spreadsheet (which includes a link at the bottom to the code that generates it). The data is reviewed quarterly by 2U’s Arbi-BOM squad to identify upcoming upgrade needs, but anyone is welcome to create an issue or pull request to flag a missing dependency or stale data. The more complete this is, the better we can plan ahead. The official web sites of each software project are the authoritative source of information about support window end dates, although endoflife.date is also a very useful resource.
It’s a good idea to create a GitHub Issue for the upgrade as soon as the need for a major upgrade is identified, before even finalizing the target completion date; then it can serve as a home for discussion of the upgrade and links to relevant documentation as it gets written. To do this:
Create the issue in the platform-roadmap repository, and give it the “maintenance” label.
After the issue creation form is submitted, edit the issue’s description to add the checklist at the bottom of this page.
Add the new issue to the Backlog column of the Open edX Roadmap • openedx project. Set any of the project’s custom fields whose values are known; common choices may include:
“Proposed by” - your organization’s name
“Platform map - Super Level” - Architecture/Platform
“Strategy” - Platform
“Type” - Maintenance
Schedule the Completion Date
Next is to decide when to start and complete the upgrade. To decide the end date of the upgrade project:
Find the date when support of the version currently in use ends.
Find the last date prior to that when the branches will be cut for a new Open edX release, according to the Open edX Release Schedule .
If there are compelling releases to upgrade earlier, pick the branch cut date for an earlier release as appropriate. (For example, major React versions get security patches indefinitely, but more and more related packages start requiring newer versions to work correctly.)
Set the roadmap issue’s milestone to be the corresponding release name.
If other upgrade projects are slated for the same Open edX release, stagger the completion dates if at all possible. We don’t want to be struggling to complete 3 major upgrades concurrently with wrapping up work on the release. Add an explicit completion date to the roadmap issue, and explain why it was chosen.
Select an Orchestration Team
Once there’s a good idea of “what” and “when”, the next step is to figure out “who”. A team (not an individual) should be selected to plan for the upgrade and make sure it gets completed on time. Some selection criteria:
The orchestration team should be doing full time work related to the software being upgraded. Part-time assistance with the upgrade from other developers is actively encouraged, but the orchestration team should be able to allocate large amounts of their time to keeping the project on track to timely completion.
The team must have some spare capacity in the near future for the next few steps, and able to dedicate a lot more time to the upgrade as the deadline approaches.
The team must have some expertise in the software being upgraded, or be able to develop that expertise at the start of the project. This is needed in order to create an efficient implementation plan and write guidance for other teams that will need to do work for the upgrade.
The team should not also be working on another major project with a similar due date.
Document in the roadmap issue which team will be orchestrating the upgrade to avoid future confusion about who is responsible for this.
Determine Scope of Impact
Now that we have a target date and people dedicated to working on the upgrade, we can determine the scope of the upgrade in order to notify those who will be impacted by it. The orchestration team should perform this step.
Find or create a repo health check to identify all the repositories using the software to be upgraded.
Create an issue for the upgrade in each impacted repository, and add it to the Todo column of the Maintenance • openedx board.
Keep the description minimal, mainly link to the upgrade issue in the roadmap. Dates, instructions, etc. are likely to be fleshed out and/or changed later and we don’t want to have to update them in dozens of issues.
Create a “Project” field value for the upgrade project, and specify it in the Table view for each created issue.
Specify the orchestration team as the “Source” in the Table view.
Specify the owner in the Table view if you can easily identify it. (2U employees and Arbisoft contractors can already do this via a private ownership spreadsheet, but I think Backstage will be the future home of this information for others.)
Leave the “Owner Does” field set to “TBD” (To Be Determined) for now.
Create a task list in the roadmap issue to track all the child issues for individual repositories.
Create a page under Upgrades for documentation related to the upgrade. Add links from this page to the roadmap issue and vice versa.
Read the release notes for the version being upgraded to (and any other versions skipped over during the upgrade), and document (under the Confluence page created in the prior step) the changes believed to be most problematic and/or interesting.
Note that we want to create the per-repository issues pretty early in the process, as they serve a few important roles:
They give teams advance notice that they need to allocate some time for the work. More detailed communications about the expected level of effort and available automation will come later, but even just this heads up that “you need to leave some room in your schedule for this” is useful.
They allow teams to specify their preferred level of involvement in upgrading each repository. Some will prefer to be hands-off and let the orchestration team handle as much as practical, others will have reasons why they want to do most of the upgrade work themselves in a particular repository.
They provide an official forum for discussing repository-specific aspects of the upgrade project.
They allow the Maintenance • openedx board to serve as a status dashboard for the upgrade project, by filtering to just the issues in that project.
Automate As Much As Practical
Now that it’s clear what needs to be done, it’s time for the orchestration team to write automation for the project wherever the time savings of avoiding manual work outweigh the time to implement the automation. Good candidates include:
Find or write codemods to automatically make some of the necessary changes to source code. See Codemods and Other Upgrade Automation for guidance on tools we’ve found or written already to do this for some of our dependencies.
Write configuration file modification scripts to automatically make appropriate updates to testing matrices and other metadata in
tox.ini
, GitHub Actions workflows,setup.py
, etc. We keep these in the repo-tools repository, you can create a new directory there and copy scripts from recent upgrades to use as starting points.Create repo health checks and dashboard(s) to automatically detect if key milestones (like Trove classifiers for appropriate Python/Django version support) in the upgrade have been achieved in a repository, so an auto-updated dashboard can be created instead of manual wiki table updates. Consider also creating a dashboard that runs these checks on the repositories of other software we depend on, to track how many of them still aren’t ready for the upgrade; this is otherwise a very manual, time-consuming task for some types of upgrades (like Python or Django).
Create a new view in the Maintenance • openedx board that filters down to just issues in this upgrade project.
Document instructions and available automation for those who will be doing work to prepare repositories for the upgrade. Link to this document in the roadmap issue.
Clarify Distribution of Work
Now that we’ve done as much as we reasonably can to automate the upgrade, we need to ask the teams owning repositories impacted by the upgrade what they want their role in the upgrade to be. The orchestration team should send an announcement (as described in