Code Maintenance Parallelization

Context

Software projects on the scale of Open edX require a lot of ongoing code maintenance effort: upgrading or replacing stale dependencies, adapting to changes in the software development ecosystem, implementing suggested small improvements, etc. Open edX has done relatively well at assigning active maintainers to all parts of the code so truly critical updates happen when needed, but has struggled a little more with prioritizing small effort maintenance tasks that collectively have a big impact on the ease of working with the code. Various attempts at prioritizing such effort within the core owning teams have met with limited success because of the opportunity cost of larger, high-impact projects that require context and expertise which can only efficiently be undertaken by these same teams. Previous attempts at crowdsourcing tasks related to major software upgrades have been somewhat more successful, and we believe that further iteration on this model of opening up certain categories of code changes to a broader base of developers can help accelerate the overall pace of improvements to the platform.

Defining Maintenance Tasks

Anyone who notices the need for an Open edX code maintenance task is welcome to create an issue for it, although the expectation is that initially most will come from the maintainers of the code involved. (And maintainers are free to decline submitted issues they feel are inappropriate for contribution.) Such issues should be:

  • Important, but not urgent. The task should be clearly useful to perform, but not needed so badly that it can’t wait at least a few weeks for someone to notice the request and take it on.

  • Not controversial. If the instructions in the description are followed, it should be feasible to promptly review and merge the changes without extended deliberation.

  • Immediately actionable. Each maintenance issue should be doable immediately, without waiting for another change to be made first.

  • Captured as GitHub Issues. GitHub Issues are the current issue tracker of choice for public work on Open edX code. The issue should be created in the repository containing the code to be changed, if possible. To create an issue related to a repository in the openedx organization for which GitHub Issues have not yet been enabled, please ask for them to be enabled via https://github.com/openedx/tcril-engineering/issues/new/choose (choose “GitHub Request - Access/Config”).

  • Documented clearly. The issue description doesn’t have to be a long essay, but should have enough context for an intermediate developer who hasn’t heard about the problem in an another communications channel (and perhaps doesn’t have any experience contributing to this particular repository) to effectively get started on the task.

  • Explicit about how to request review. The implementer may well lack access to your company’s internal communication channels, and direct pings on a PR to the issue author often get lost in a sea of GitHub notifications. Be sure to specify how the author can request review in a way that will attract reasonably timely attention.

  • Labeled with “help wanted” and “maintenance”. These GitHub issue labels identify the tasks as being available for work to anyone in the community with interest and motivation to help out (the “help wanted” label), and as code maintenance tasks that can’t be ignored indefinitely (the “maintenace” label). If you don’t have permission to assign labels to the issue directly, you can try adding 2 separate comments to the issue containing the text “label: help wanted” and “label: maintenance”

Recruiting Help with Maintenance Tasks

There are several candidate pools of assistants with maintenance tasks, and incentives for them to help. Note that some experience is required; the teams creating these tasks and reviewing code contributed for them aren’t set up to mentor people with almost no relevant experience. Junior developers are welcome, as long as they have successfully made similar contributions in the past (not necessarily in Open edX). But people still learning how to create their first pull request will probably just get frustrated until they master that skill in a more beginner-oriented context.

  • Core Contributors are expected to contribute 20 hours of effort per week to Open edX, and working on maintenance tasks is a perfectly acceptable use of that time.

  • We can (and should) create forum badges for people who complete different numbers of maintenance tasks, which may motivate others in the community to help.

  • Development sprints at conferences are a good venue for recruiting interested developers from outside the current Open edX community; there are often relatively experienced developers looking for a new project to get involved in, and Open edX is a relatively popular choice when someone is present to promote it.

  • Events such as Hacktoberfest can also attract new contributors, especially if there’s a curated set of tasks (like the maintenance tasks described here) to work on. But someone has to take the time to get things set up for participation in the event.

  • Boot camps and advanced programming classes sometimes seek real-world projects for students to contribute to. This can work as long as the course instructors and staff can provide the beginner support that the Open edX teams typically aren’t fully prepared to offer.

Performing Maintenance Tasks

While we definitely want help with maintenance tasks, there are a few guidelines to keep in mind when choosing one in order to minimize the amount of frustration for all parties involved:

  • Find a task. Here is a search query for all unassigned open maintenance tasks.

  • Request assignment. When you’re ready to immediately start work on the issue, assign the issue to yourself if you already have sufficient permission to do so, but in most cases you’ll need to comment on the issue to request that it be assigned to you. This may not be explicitly reflected in the “Assignee” field of the issue depending on the repo’s permissions, but at a minimum the request indicates to other interested individuals that the task is already being worked on.

  • Describe the changes. The PR description doesn’t need to be an essay, but it should give context about why these particular changes are being made this way. And definitely make sure to include a link to the maintenance issue that the PR was made in response to.

  • Request unassignment when appropriate. If something comes up and you’re unable to make progress on the task for a week or more, please state that on the original issue so it doesn’t become indefinitely blocked on you. If you have an in-progress PR and followed the advice above, there should already be a link to it in the issue history from when you mentioned it in your PR’s description.

  • Get checks to pass before requesting review. With rare exceptions, a failing check on a pull request is a good indication that something needs to be fixed before it can be merged. Please resolve such failures if at all possible before requesting review, so the reviewer doesn’t have to spend time pointing out the failures and deciding if it’s even worth looking at the changes any further yet.

  • Request review when done. Once you’ve finished the task and checks are passing, ask for review via the mechanism requested in the maintenance issue. If the issue author neglected to specify one, try @-mentioning their username in a comment on the PR to get their attention.

Reviewing Maintenance Tasks

Once a PR for a maintenance task is ready for review, the maintainer for that part of the code needs to establish a plan for getting it either merged or explicitly declined.

  • Declare timeline. It’s great if the maintainer can promptly review the PR, but that’s not always practical given other demands on their time. They should at least attempt to give an estimate of when they’ll have time to review the PR, though.

  • Optional first-pass review. If the maintainer needs assistance with the review burden for maintenance tasks, they should identify some group (Core Committers, partner team, etc.) to perform an initial review to catch any problems that don’t require the maintainer’s specific expertise to catch or resolve.

  • Minimize strings attached to merging. Overloaded code maintainers in large organizations sometimes impose burdens (like deploying the changes or monitoring production after deployment) on code contributors within the same organization to avoid becoming even more overloaded. But given that external contributors don’t have access to such systems, the maintainer will need to directly merge the PR and perform these tasks if they want to leverage external development effort via “help wanted” issues.

Prior Art

OEP-25: Incremental Improvements — Open edX Proposals 1.0 documentation (INCR) was the first attempt at defining such a process, and this was used as the basis of distributing the Python 2 → 3 upgrade work across the Open edX community and even casually interested developers who had not yet joined the community. It succeeded in processing a large volume of relatively straightforward code changes, and encouraged a handful of people to stay involved in Open edX for a while, but imposed a significant code review burden on a handful of edX employees and didn’t give the new contributors a good path for growing into more regular involvement. A success overall, but with clear room for improvement.

Django 3.2 Upgrade: How the Community Can Helparchived documents the next big effort we made to distribute a large maintenance project across the community. The tasks assigned to the community were specifically chosen for minimal review need from the Open edX community, primarily involving review by upstream package maintainers for Open edX dependencies. This wasn’t as well suited for junior developers as the INCR tasks, and many of the assignments had a long tail of back and forth with the upstream maintainers, but it did free up Open edX repository maintainers to focus on the parts of the upgrade that they were most uniquely suited to undertake.