Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Summary of Issue

...

  • Why didn't we catch this?
    • Why was Koa.2 tagged without the fix?
      • Why was Koa.2 tagged without testing the tag?
        • Why did CI not catch this?
          • Why is CI currently not running the exact native installation procedure?
            • Because it's running OpenCraft's customization, which didn't catch the database issue.
              • Why are we running OpenCraft's customizations?
                • Because OpenCraft offered, using their pre-existing sandbox builds. The sandbox builds are what they had already.
                • How might we enhance OpenCraft's CI implementation to run the exact native installation procedure.
          • CI is difficult to maintain - fails for obscure reasons.
        • Why did we believe that CI would catch it?
          • Why didn't we have the deltas between the native installation and what the CI was doing?
            • Because it isn't intended to not have large deltas.
              • For cost reasons, we are using the same database.
    • Because the platform is complex and can result in unanticipated issues.
    • Why did we miss addressing Koa open issues before tagging?
      • Why didn't we act on the discourse thread?
        • We did create a ticket for ourselves.
          • Why did we miss fixing BTR-61, which was tagged with Koa?
            • How might we use GitHub milestones to not miss such in the future?
              • Could have used Jira tickets with the koa.2 label.
            • Because the current playbook has the mechanics of tagging the repos, but doesn't cover process for ensuring all needed fixes are in.
              • How might we update our tagging playbook to catch this in the future?
                • Include concrete checklists for verifying that tickets were completed.
            • How might we include severity and priority on open issues?
        • Why did it take 2+ weeks to react?
          • Discourse → BTR ticket took a week.
            • Why are we relying on a single individual to do this?
              • How might we have more people helping others in the community in Discourse?
            • Why are we using multiple reporting tools? (Discourse, Jira and now GitHub issues)
        •  Why did we miss that this was a critical issue for the native install, for more than 1 week?
          • Because there are several issues at times.
    • Why did we not include a fix from master?
      • Master moves at a fast pace. We are not looking at each commit. We are looking at security fixes right now.
      • Fixes that are pushed to master are not included in fixes to other branches (named releases)?
        • Why did the PR author miss it?
          • How might we make use of Conventional Commits and/or Pull Request templates to improve catching this?
        • Why did the BTR group miss this?
    • Why did the Koa.2 installation break in the first place?
    • Why does the BTR group not "Dogfood" the native installation?
      • There needs to be a process or reason to do this.
      • Why is there no process for this?
      • Why is there no reason to do so? 


 

How could we have prevented it?

...

  •  Former user (Deleted) Update the runbook as follows
    •  Don't tag the release without manually installing it.
    •  Ensure all must-have tickets are completed.
  •  Former user (Deleted) Write and distribute roles
    •  Write down required roles for BTR
      •  e.g. Manual QA, Triage issues, Discourse assistance, Tagging repos, Announcing, Reviewing release notes.
    •  Assign individuals to each role (could be rotatable, per release).
  •  Sofiane Bebert Decide and document how BTR will track issues
    •  Prioritization marking