Arch Hour: 2020-11-05

Topics

Please enter your proposed topics for discussion here.
In Lean Coffee Style (link1, link2), we will vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

  • Review latest Thoughtworks Tech Radar - could do so as breakout groups.

  • Review decisions listed as “irreversible” - should they be considered irreversible? What would be needed to allow those decisions to become reversible?

    • Time-scale of “reversibility” - since everything could technically be reversible with greater effort.

      • 1 squad-month?

    • Case studies:

      • comments service is written in Ruby

        • rewriting earlier would be less effort than rewriting later, once time lapses and more features exist

      • ORA v2 rewriting of v1 due to overall pain with v1

      • VEDA → VEM

        • since infrastructure choice

        • data migration is hard. Doing a migration and then switching back would have put us in a really difficult place, maintenance-wise

    • Notes

      • Reversibility of a decision is not fixed - changes over the lifecycle of a product/feature

      • The irreversibility of a thing is inversely proportional to how much you want to reverse it.

      • There is a cost-benefit analysis at question here.

      • How much are you committing the whole organization to pain, vs. just your own team?

        • Also long term impacts on organization (e.g. new tech stack that needs to be supported) rather than on other teams directly. “Can we move people to a project if it uses Rust when we’re a Python shop?” (edX is not just the engineering organization.)

        • For example, introduction of a new infrastructure, such as Neo4j and Node.js.

      • That holds up better if the teams are long lived and stay with their code bases.

      • 
Maybe it goes without saying, but judiciously adding layers of abstraction in the right places can also help reduce impact

    • What would be needed to allow those decisions to become reversible?

      • 1. reduce radius impact

        • reducing the blast radius of impact (# of places to update, etc)

          • Example: Axios

            • So far over because the interface is very particular to how it works. Blows out every MFE that was using it vs. other clients

              • Have to update twelve frontends, tedious, annoying to coordinate

            • Benefits are modest

            • If we could make it easier to modify all the places that were using this, the blast radius would be reduced. Part of what frontend platform is made to do, but this is a huge, opinionated interface. (probably not worth it).

            • Note: the contagion factor for Axios is still high - as a choice for frontend-platform

          • Similar situation with the new oauth client

        • counter-point

          • Lockstep upgrade is not required across the organization

          • Although the effort is large, there can still be changes

      • 2. increase efficiency of org-wide refactorings

      • 3. improve versioning

        • of APIs

        • of services, including core

    • Which Tech Debt is worth addressing?

      • Rubric

        • Painfulness

        • Contagion factor (“How much is the difficulty to change the decision going to increase over time?”)

      • So with the Axios example: cost of starting changes is low, cost of finishing changeover is high, and high contagion.

    • edx-platform versioning and not releasing master?

      • if more of core can be versioned libraries, …

      • pip installing edx-platform → requires a lot more discipline and documented breaking versioning

Backlog of Questions/Discussions

This section lists a backlog of previous proposed topics that haven’t yet been discussed.