Redwood.1 Retrospective 🌳⏪
I. Context, data, and highlights 🏅
The Product Working Group was closely involved in the process and led the implementation of several changes to it (Redwood Release Testing Strategy ), including:
Monthly meeting to track new features and initiatives to be included in Redwood (https://docs.google.com/spreadsheets/d/1tWgp9LXNg4sfWYd_0ghNl6qfIZZ9851MtGBXPeSgzFs )
The Product Core Working Group should take ownership of a “master” test spreadsheet of new features
Maintainers are responsible for fixing bugs or getting the authors of the feature question to do so
The Redwood master branch was cut on 09/05 as planned.
The official testing period was cut in half to increase the chance of landing more new features in Redwood. For this reason, the testing process was launched on 16/05 (Join the Redwood Testing Team! ).
Due to a last-minute security vulnerability (Security: Upcoming Security Release for edx-platform on 2024-06-17 ), Redwood was delayed a week and a half, to be successfully released on 19/06.
To target their respective audiences more specifically, the release notes were divided into https://docs.openedx.org/en/latest/community/release_notes/redwood/feature_release_notes.html and Open edX Redwood Release - Developer and Operator Release Notes — Latest documentation.
II. Successes and achievements 🏅
[Peter]: Although it was tight, I was impressed by the community’s ability to complete the testing on schedule (perhaps we were helped by some features being pushed to the next release).
[Chelsea]: It seemed that the prioritization of discovered bugs was helpful for the triaging process.
[Sarina] A lot of people really stepped up this release. The two that were most visible to me were @Maria Grimaldi and @Chris Patti , as well as @Jorge Londoño and @Maksim Sokolskiy .
[Sarina] largely, new product processes worked, and involved more people than previously in the release
[Maria] Thanks to the product working group's involvement, we had a framework to prioritize issues and classify release blockers. This helped us focus even more on which issues to triage first.
[Maria] Even though we found the security issue when almost finishing the testing process, we solved it in a timely manner with community help.
[Maria] Maintainers participated more in solving issues than in previous releases.
[Adolfo] Main features were developed with Redwood in mind, and thus reduced the incidence of bugs or omissions that would’ve otherwise needed to be caught and fixed during testing. Shout-out to Raccon Gang and Opencraft for doing so with the Studio MFE and Tagging, respectively.
III. We should keep ✅
[Jenna]: Propose that we keep Product ownership of the “master” test spreadsheet, and continue to flesh out regression tests
[Jenna]: Propose that we keep the monthly meeting to track features slated for Sumac (can start again in August
[Chelsea]: Keep adding cucumber formatted tests to the test list? (Given I… When I… Then I…)
[Maria]: Keep the “Priority” test sheet column up to date and matching the product narrative so triagers can map it to test failure reports.
[Maria]: Propose that the product working group continue helping bug triages prioritize issues on all BTR boards. I say “all” since not all boards use maps to test failures, which bug triagers can classify by looking at the testing sheet, so we also need help for the rest of them.
[Maria]: Keep the release blockers framework proposed by Jenna, but document it somewhere in the release testing process docs.
[Adolfo] Developing new features and testing them with Tutor nightly (including via PR sandboxes) before the PRs land.
[Adolfo] We should continue to enlist the help of more, caring, maintainers that are not only aware of but engaged with the release process. This should not only help reduce the incidence of regressions and other bugs, but add to the pool of resources available to fix issues found during testing.
IV. What didn’t we do so well, and what could we have done to do better ❌
Product
[Jenna]: Redwood was feature-packed. How can we scale back and still have a meaningful set of capabilities/features in Sumac, without over-committing?
[Sarina] how can we “fast fail” in the monthly release meetings run by product (that is, “kick” something out of the release as soon as it’s slipping, and divert resources to getting the other things done)?
[Sarina] consider having release meetings every month, always looking forward to the next release’s cut. Starting meetings in August for an October cut seems a bit late.
Testing
[Jenna]: How can we tighten the lead time between the code cutoff and having the testing sandboxes ready? For Redwood, there was about a week in between. How can product help in defining which features should be enabled, etc?
[Jenna]: Only having 3 weeks for testing ended up feeling tight. What what would it look like to try a 6 week testing period between code cut and release, especially if we can shorten the lead time on the sandboxes?
[Chelsea]: We could potentially firm up exactly how the testing sandbox should be configured and make sure that’s communicated well with the team setting up the testing sandbox (Example: it was a miss on my part to make sure Aspects was enabled from the start).
[Peter]: Some of the regression tests are poorly defined and therefore difficult to test confidently. For the next release, we should try to define the legacy tests as well as the new tests were defined.
[Chelsea]: I had claimed/planned to do a test that only once I sat down to do the test did I realize how technical the tester needed to be to test the functionality. It made me wonder if our test sheet should have a designated label for tests that need to be done by an engineer or can be done by anyone.
Bug triage and fixing
[Maria] Although 3 weeks were enough to complete a high percentage of tests, it wasn’t enough time for bug triages to intervene and help maintainers solve high-priority issues. We currently have 9 high-priority issues on the board that haven’t been solved (see both Release Testing and Test failures).
[Maria] We need more maintainers' and community participation to solve issues reported on the board, even after the Redwood cut-off. How do we increase participation so the boards don’t overflow with unsolved issues from past releases? How can we more effectively draw maintainers' attention?
[Maria] We need a better way of organizing issues from other repositories that directly affect the release. This time, I created two boards: Release testing, which tracks issues created outside the BTR repo, and Testing failures, which tracks test failure reports. However, now I find having both views a bit confusing. Maybe this could be fixed by having more explicit names or having a single view for both and differentiating them using labels.
[Maria] As bug triagers, our main focus should be triaging and solving high-priority issues and blockers in a timely manner. This helps to avoid accumulating issues over time and avoids solving blockers in a rush if we don’t get maintainers' attention in time.
[Maria] We need to automatize getting issues labeled as “release testing” from other repos into the BTR repo for better visibility.
Security
[Peter]: There was a lot uncertainty and confusion with the security issue, particularly around what and with whom details could be discussed. It would be good to have a documented and, ideally, more transparent process before this happens again.
[Maria] We need better documentation to classify security issues and report them effectively.
Release notes
[Maria] We need to automatize getting new settings/feature flags/waffles flags for the release notes. I tried it but didn’t have enough capacity to finish the script.
Backports
[Feanil]: Even if it’s not officially supported, should we allow the community to provide backport fixes to older releases at the release managers discretion? I feel like people are making these on their local forks and we could just let them upstream them if they can be validated easily.
[Maria] Is the BTR issue for tracking backports enough? I don’t think it’s being used. Also, how are we testing them? Is it reported anywhere? I think Max did this in some capacity, I’m not sure though.
VI. Action Items for Sumac release 📈
Product
[Jenna]: Creating a Gantt chart to track every major project/feature for the next release at a more granular level (epics) instead of at a high level (project) to better assess the risk of whether delivering or not each one of the new features and plan the testing process and required capacity better and in advance.
[Jenna]: Make the Release planning meeting a monthly one and include it in the Community calendar to make sure we start planning every release with enough anticipation.
[Jenna]: Product to update https://docs.google.com/spreadsheets/d/1tWgp9LXNg4sfWYd_0ghNl6qfIZZ9851MtGBXPeSgzFs/edit?gid=0#gid=0 with current plans for Sumac, ahead of the August monthly planning
[Jenna] To discuss: A new role for a product-BTR liaison/coordinator
To coordinate bug prioritization
To collaborate with testing manager to write and manage regression and AC tests
Testing
[Adolfo]: Continue the conversation about community-supported and always-available sandbox environments.
[Sarina & Adolfo]: Asking the Maintenance WG about the branch cutting and release date.
[Peter]: Review all the tests.
Remove ones that are no longer relevant.
Write better test descriptions for legacy tests, in the style of the new Redwood tests.
Look for opportunities to automate tests.
[Peter]: To discuss: should we invest in the test course repo?
[Maria]: review issues reported on the board with “no test id” or “no test case” to decide whether to include them in the test sheet or specify which test they are part of
Bug triage and fixing
[Maria]: Try to solve issues on the board alongside maintainers to avoid accumulating them for Sumac
[Jorge]: Document the communication process, calls for help, and working agreements between the BTR WG and the Maintainers WG to get them more involved and engaged in the Open edX release process.
Communicate release blockers
Categorized bugs board
[Adolfo]: Working on an official proposal for automating the sync between the testing spreadsheet and GH.
Security
[Peter]: Confirming and documenting the process for reporting security issues during the release process.
Backports
[Majo]: Improving how we handle backports and the information radiators we use.
[Jorge]: Creating and publishing a backports policy including old releases PRs.
Release notes