Maintenance Board (GitHub Project)

The https://github.com/orgs/openedx/projects/51/views/1 board tracks the work needed to coordinate and execute a major technology upgrade or similar software maintenance task across many repositories. It also captures and tracks work in an Open edX repository which is orchestrated or implemented by a team other than its owner, so we can clearly and efficiently hand off responsibility for moving the work along when appropriate. It does not capture all the routine software maintenance that can be done within a single repository and team with little impact on others, or infrastructure updates that are largely internal to a single organization’s deployments.

Statuses

  • Todo - The need to perform a task has been identified, but implementation has not yet started.

  • In Progress - A team has been chosen to perform the task, and they have started work on it. A PR in this status may have already passed through one or more rounds of review, but it’s on the PR author to take the next steps.

  • Author Team Review - The author has completed implementation work on the task, has asked their teammates for review, and is waiting for their feedback. The owning team doesn’t yet need to take any action.

  • Owner Review - Implementation is complete, the implementing team has asked the owning team for review, and that review is not yet complete. The implementing team is blocked on the owning team.

  • Approved - The PR has been approved by the owning team, but the chosen service level for the task makes merging the implementing team’s responsibility, and they haven’t yet done so.

  • Done - The PR has been successfully merged.

Field Definitions

Prioritization

Issues have assigned priorities via the “Priority” field of the project. This is more of an art than a science, but try to prioritize roughly as follows:

  1. Significant security vulnerabilities or risks

  2. Final tasks in nearly complete projects (minimize time to delivering value, limit work in progress)

  3. Tasks for projects whose target dates are very close

  4. Tasks which are likely to involve a lot of latency (working with upstream package maintainers, etc.)

  5. Tasks which block other tasks in the “Todo” backlog

  6. Everything else

Most of the views in the board are sorted by priority, with the most urgent tasks at the top.

Usage

Orchestrating or Implementing Team

  1. As soon as you identify that action will be needed by the owning team of a particular repository, create an issue for it in that repository and add it to the board. Try to determine this early in the planning process, not after starting to write the code. If the owner can take action now, add the “needs maintainer attention” label; otherwise, set the assignee to a team member responsible for adding enough information later to make it actionable. Set the “Source”, “Owner”, and “Project” fields as soon as possible after the issue is added to the board.

  2. Give enough information in the description or links from it and the “Priority” field for the owning team to make an informed choice of service level. Wait for that service level choice before trying to write an implementation (the team may prefer to do it themselves).

  3. Once there is a concrete PR for an item of work, add the PR to the board and copy the values of the “Source”, “Owner”, “Owner Does”, “Project”, and “Priority” fields. Remove any prior issue for it (but keep a link to the issue in the PR).

  4. Promptly move issues and PRs between status columns when appropriate.

  5. When moving a PR to “Owner Review”, add the “waiting for eng review” and “needs maintainer attention” labels and request review from the owning team (according to the team’s preferences at the far right of the Squads tab of the ownership spreadsheet, if you’re a 2U or Arbisoft employee). Add a comment to the PR with links to any tickets, Slack threads, etc. where review was requested.

  6. Once a PR reaches the “Approved” column, take prompt action to merge it and perform any remaining implementing team responsibilities for the service level chosen for that task. The longer the PR sits idle, the higher the risk of merge conflicts and the team’s WIP (work in progress).

  7. Whenever context switching, generally try to select a next task according to the following rubric. Whenever there a multiple tasks with the same status, pick the one closest to the top of the board (highest priority).

    1. First, any tasks in the “Approved” column. They’re the closest to being done.

    2. Second, any tasks that you worked on which have moved back to “In Progress” after owner review. The faster you’re ready to ask for the next round of review, the more likely the reviewer will remember the context and be able to respond promptly.

    3. Third, any tasks in the “Author Team Review” column. Try to keep your teammates unblocked.

    4. Fourth, any other “In Progress” tasks that you were previously working on.

    5. Finally, pick up a new task from the “Todo” column. This increases team WIP, so only do this once you’re pretty sure you can’t nudge along something else that the team is already working on.

  8. Adjust the “Priority” field of tasks when appropriate, for example when:

    1. New security ramifications are discovered

    2. Projects get stuck waiting on completion of the last few tasks

    3. The project’s target date gets close and there’s a potential risk of not achieving it

    4. New dependencies between tasks are discovered

Owning Team

  1. Triage any PRs in “Owner Review” and take one of the following actions.

    1. Set the assignee to a team member who will be responsible for performing the review, and remove the “needs maintainer attention” label.

    2. Once the review is complete, remove the “waiting for eng review” label and do one of the following:

      • If changes are requested add the “waiting on author” label, set the assignee back to the author, and move the PR back to “In Progress”.

      • If the PR is approved and “Owner Does” includes “merge”, merge the PR and perform the remaining owner responsibilities from the service level chosen for that PR.

      • If the PR is approved and “Owner Does” does not include “merge”, move the PR to the “Approved” column.

  2. Set the “Owner Does” field of recently added issues in “Todo” to an appropriate value.

  3. Check the board before planning the work for each sprint to determine if any tasks of types 1-2 above should be included. Also check the “Merge Forgotten” tab to see if any PRs that the team is responsible for merging were accidentally left with just an approval instead.

  4. If you have an on-call rotation for handling interruptions, please make it one of this role’s responsibilities to periodically check the team’s tab on this board and take steps to perform outstanding tasks of types 1-2 above. The shorter the latency in moving tasks along, the more efficiently we can complete maintenance work, freeing up developers to work on other priorities.

Future Enhancements

The following improvements to the Maintenance board are under serious consideration:

  • When a PR moves to the “Owner Review” column of the Maintenance GitHub Project board, a GitHub Actions workflow looks up the owning squad’s notification preferences in the Ownership spreadsheet.  It then sends the appropriate Slack message, Jira ticket, etc. to request review and links to them in a comment on the PR, saving the author from having to do all this manually. (And somewhat mitigating the problem of publicly opaque team preferences for PR review request notifications.)

  • Either update an item’s status appropriately when the “waiting on author”, “needs maintainer attention”, or “waiting for eng review” labels are added, or add the labels automatically when the status changes. (The status is important for making the board useful, the labels are important for synchronizing status in other boards containing the same items.)

  • When a PR moves back from “Owner Review” to “In Progress”, the author is notified via Slack and email. This should reduce the odds of failing to notice that the ball is back in their court.

  • Automatically add Renovate and Python dependency upgrade PRs to the “Owner Review” column with the appropriate owner set (which triggers the notifications above). Some teams don’t need the extra nudging, but others would benefit from it.

  • Periodic reminders to teams of PRs pending their review and Issues pending selection of a service level. These could go out via email and/or Slack.

  • Daily workflow that updates a project field for each issue indicating the number of days since the last status change.

Rationale

The board was created to address the following problems:

  • PRs often need approval and/or deployment from the owning team, but are often forgotten when originating from another team because they don’t appear on the team’s Kanban board, the author isn’t present in owning team meetings, etc. It can take weeks or months in some cases to obtain even relatively low effort reviews.

  • PR authors often lose track of when the ball is back in their court because the team’s Kanban board has a single “In Code Review” column that fails to clarify whose action is currently required.

  • Major dependency upgrades (Django, Python, Node, etc.) can involve dozens of repositories and many teams, which is a major project management challenge. Capturing all these tasks in the same board can prevent the need for a lot of manual status tracking in Confluence tables, spreadsheets, etc.

  • Teams blocked on other teams for review gradually take on more and more projects as they wait, to the point where they have trouble keeping track of everything and making rational prioritization decisions.

  • Teams are unclear of what maintenance tasks they’re responsible for, and especially of the relative priority among those tasks.

  • It’s easy to lose track of whether review of a PR was actually requested, and even if it was, the most common venues for the requests are designed such that the request quickly vanishes into history with no scheduled reminders.