Page Comparison

Summary of Issue

...

Why didn't we catch this?
- Why was Koa.2 tagged without the fix?
  - Why was Koa.2 tagged without testing the tag?
    - Why did CI not catch this?
      - Why is CI currently not running the exact native installation procedure?
        Because it's running OpenCraft's customization, which didn't catch the database issue.
        Why are we running OpenCraft's customizations?
        Because OpenCraft offered, using their pre-existing sandbox builds. The sandbox builds are what they had already.
        How might we enhance OpenCraft's CI implementation to run the exact native installation procedure.
      - CI is difficult to maintain - fails for obscure reasons.
    - Why did we believe that CI would catch it?
      - Why didn't we have the deltas between the native installation and what the CI was doing?
        Because it isn't intended to not have large deltas.
        For cost reasons, we are using the same database.
- Because the platform is complex and can result in unanticipated issues.
- Why did we miss addressing Koa open issues before tagging?
  - Why didn't we act on the discourse thread?
    - We did create a ticket for ourselves.
      - Why did we miss fixing BTR-61, which was tagged with Koa?
        How might we use GitHub milestones to not miss such in the future?
        Could have used Jira tickets with the koa.2 label.
        Because the current playbook has the mechanics of tagging the repos, but doesn't cover process for ensuring all needed fixes are in.
        How might we update our tagging playbook to catch this in the future?
        Include concrete checklists for verifying that tickets were completed.
        How might we include severity and priority on open issues?
    - Why did it take 2+ weeks to react?
      - Discourse → BTR ticket took a week.
        Why are we relying on a single individual to do this?
        How might we have more people helping others in the community in Discourse?
        Why are we using multiple reporting tools? (Discourse, Jira and now GitHub issues)
    - Why did we miss that this was a critical issue for the native install, for more than 1 week?
      - Because there are several issues at times.
- Why did we not include a fix from master?
  - Master moves at a fast pace. We are not looking at each commit. We are looking at security fixes right now.
  - Fixes that are pushed to master are not included in fixes to other branches (named releases)?
    - Why did the PR author miss it?
      - How might we make use of Conventional Commits and/or Pull Request templates to improve catching this?
    - Why did the BTR group miss this?
- Why did the Koa.2 installation break in the first place?
- Why does the BTR group not "Dogfood" the native installation?
  - There needs to be a process or reason to do this.
  - Why is there no process for this?
  - Why is there no reason to do so?

5 whys
infinite hows

How could we have prevented it?

...

Former user (Deleted) Update the runbook as follows
- Don't tag the release without manually installing it.
- Ensure all must-have tickets are completed.
Former user (Deleted) Write and distribute roles
- Write down required roles for BTR
  - e.g. Manual QA, Triage issues, Discourse assistance, Tagging repos, Announcing, Reviewing release notes.
- Assign individuals to each role (could be rotatable, per release).
Sofiane Bebert Decide and document how BTR will track issues
- Prioritization marking

Versions Compared

Old Version 12

New Version Current

Key

Summary of Issue

How could we have prevented it?