...
Review latest Thoughtworks Tech Radar - could do so as breakout groups.
Review decisions listed as “irreversible” - should they be considered irreversible?
Time-scale of “reversibility” - since everything could technically be reversible with greater effort.
1 squad-month?
Case studies:
comments service is written in Ruby
rewriting earlier would be less effort than rewriting later, once time lapses and more features exist
ORA v2 rewriting of v1 due to overall pain with v1
VEDA → VEM
since infrastructure choice
data migration is hard. Doing a migration and then switching back would have put us in a really difficult place, maintenance-wise
Notes
Reversibility of a decision is not fixed - changes over the lifecycle of a product/feature
The irreversibility of a thing is inversely proportional to how much you want to reverse it.
There is a cost-benefit analysis at question here.
How much are you committing the whole organization to pain, vs. just your own team?
Also long term impacts on organization (e.g. new tech stack that needs to be supported) rather than on other teams directly. “Can we move people to a project if it uses Rust when we’re a Python shop?” (edX is not just the engineering organization.)
For example, introduction of a new infrastructure, such as Neo4j and Node.js.
That holds up better if the teams are long lived and stay with their code bases.
Maybe it goes without saying, but judiciously adding layers of abstraction in the right places can also help reduce impact
What would be needed to allow those decisions to become reversible?
1. reduce radius impact
reducing the blast radius of impact (# of places to update, etc)
Example: Axios
So far over because the interface is very particular to how it works. Blows out every MFE that was using it vs. other clients
Have to update twelve frontends, tedious, annoying to coordinate
Benefits are modest
If we could make it easier to modify all the places that were using this, the blast radius would be reduced. Part of what frontend platform is made to do, but this is a huge, opinionated interface. (probably not worth it).
Note: the contagion factor for Axios is still high - as a choice for frontend-platform
Similar situation with the new oauth client
counter-point
Lockstep upgrade is not required across the organization
Although the effort is large, there can still be changes
2. increase efficiency of org-wide refactorings
3. improve versioning
of APIs
of services, including core
Which Tech Debt is worth addressing?
Rubric
Painfulness
Contagion factor (“How much is the difficulty to change the decision going to increase over time?”)
So with the Axios example: cost of starting changes is low, cost of finishing changeover is high, and high contagion.
edx-platform versioning and not releasing master?
if more of core can be versioned libraries, …
pip installing edx-platform → requires a lot more discipline and documented breaking versioning
Backlog of Questions/Discussions
...