[BD-04] Project Retrospectives
2021-03-19: Phase 1 Retro
What went well?
Code readability has improved
XModules were all converted!
It came under budget!
Relatively small number of dependencies between conversions, so some parallelization was possible before getting blocked on reviews.
There were no bugs or issues during the conversions that got shipped to production.
even with many tests not working! (Just to be clear, those tests had been running before the final rebase before the merge)
Dave: OMG this gave me a heart attack, but this was definitely a test infra issue on our end.
Progress toward completion was easier to understand for this project compared to other blended efforts, especially when we made the Conversion Tracker. It’s all green now!
Ideal sort of Blended project to be a reviewer while working on other projects. Minimal overhead; easy to jump in and review for an hour or two in between other tasks.
Very low meeting overhead (for me, at least) -Kyle
What were the meetings/touchpoints? Sync & async stuff
The PRs themselves were really easy to review - they always included great context & testing instructions.
The MRO charts were also really helpful, FWIW
What could be improved?
Centralized Platform Knowledge & Code Complexity: This part of the system is understood by few people (they’re on this call!) which limits our ability to distribute work.
By simplifying this part of the platform we are helping this issue, but just wanted to flag this risk of an area of concentrated platform context.
Project Duration & Centralized Expertise: The project took longer to get done than initially estimated. This is mostly because of being bottlenecked on one person.
one person → is that Usman or Dave/Kyle? Usman.
Usman - was doing almost all of the dev work
Few reasons: this area of the codebase is really tricky; lots of implications across codebase when making changes; hard to have a good sense of what can go wrong. Constrained how we could distribute the work
Getting it done right >> getting it done 2 months earlier
The platform complexity of these changes made production issues a reasonably large risk. (which is why lack of bugs is such a big win - see other section)
Why delayed?
Having trouble finding time to work on BD-04 (get pulled into other things) - hard to do in smaller chunks; need dedicated heads-down times
Deep work requires deep work time.
Project Decision / Task Tracking Options: Lightweight PR + Slack conversation project tracking may mean action items / tasks / decisions are harder to track?
other projects with heavier coordination also have meeting notes / task lists
tricky balance here since speed is more important than complete docs perhaps?
minimum threshold: shared slack channel
regular meetings & meeting notes is more heavyweight, likely unneeded for this type of project
Was there any synchronous meetings? Possibly, we can’t quite remember
Not a strong need for requirements gathering
Usman had a strong sense of what needed to be done
Sometimes technical questions/decision points fell through the cracks and Usman had to gently poke Dave to get answers.
Were we using ADRs to make these decisions or some other method?
They were not really ADR worthy–talking about individual XModules and low level things.
Test Infrastructure: Anything to say about test infrastructure?
Anything that would ensure Dave DOESN’T have a heart attack (see note above) :)
catching that common/lib/xmodule tests stopped running would have been really good. I think this is on Jeremy'’s mind
Sarina: yes, I’ve discussed this with Jeremy
Going to need to rely on it more as we do Phase 2
Thinking we’ll do major breakages (expect to see more obvious failures than converting an individual xmodule)
Dave not as concerned about this
Shepherding edx-platform merges (b/c of continuous deployment) is always a minor point of friction – this is true for any OSPR and for the core committer program as a whole.
In practice though, I think this was smoother than most because nothing is critically blocking.
agreed
PR Review Cycle Times: Sometimes it took some time for PR reviews to be done. Perhaps more capacity would have helped?
+1. T&L has had a lot on its plate and blended review always lags noticeably on busy weeks
But also, very few people at edX feel comfortable reviewing this stuff.
Usman: phase 2 done differently. Much larger than Phase 1.
Get 3 other devs onboarded
Usman support Dave/Kyle with reviews
Work is more chunkable - feels lower risk than Phase 1, easier to have other people do the dev work
What did we achieve?
Helped demonstrate the benefits of blended development and core committer programs toward continuous upgrade/improvement efforts.
Showed that we could do a major core platform refactoring through blended without major production issues.
Converted key content infrastructure areas toward modernizing the guts of open edX’s content delivery core
edx-platform is a (at least) a bit more comprehensible than it used to be, whether or not you’re working in the guts of courseware. No more explaining to new devs why course objects are called “CourseDescriptors”.
Decisions
For Phase 2: Valuable to be able to describe why we’re doing things; project tracker/milestones are important to communicate out - helps us celebrate progress, and share out w/ other people in the org.
For Phase 2: Are there ways to measure code complexity to show value?
Length of inheritance chain?
.
Action Items
Anything else you want to talk about? (parking lot)
What is the name of the project now?
phase 2 will be a separate BD project (so not BD-04 no more)
Opp to name next phase of project in a way that conveys more value/outcome
.
.
.
Key Take-Aways
Infra cleanup works really well as a Blended project
Consider future projects: Phase 2, Old Mongo deprecation
Async conversations via email, PRs, and shared Slack channels worked well for this project - very little synchronous conversation
Low meeting overhead left more time to get project work done
Working in an area that few understand (internally as well as externally) can lead to delays - bottlenecks are likely, and we’re highly constrained on how work can be distributed.
Failure of common/lib/xmodule tests could have spelled disaster for the project (fortunately everything was OK) - but test infrastructure needs to be robust for intensive projects like this