RCA for Monday's hotfix

Date

Attendees

Overview:

On Friday afternoon, April 10,  PLAT-608 - Getting issue details... STATUS  was reported. On Monday morning, the cause was identified and a hotfix branch was cut. However, there was a good deal of confusion about what the correct process was for generating a hotfix. I'd like to go over the cause of the confusion, since this indicates that there's a lack of clarity around the whole hotfix process.

 

Confusion:

  • On the timing of the hotfix. What to do in the following scenarios:
    • We had already cut the release candidate and we have to cut the hotfix branch after.
    • We have a hotfix, but the release is going to be cut in the next two hours.
    • So we have a scenario where the RC is cut, and then a hotfix branch is made. So when you merge the hotfix -> release -> master, the hotfix isn't on the RC. So how do we get the hotfix on the RC.
    • I think the right thing to do in that scenario is hotfix -> release -> rc -> release -> master
  • What do we do when there's a merge conflict?

 

  • Why couldn't we have scrapped the RC altogether and cut a new one after the hotfix went out? Or just coordinate with release master?
    • What's stopping us for looking for help?
    • More communication would have been good.
    • Clear about how we're going to do communication.
      • Like for devops, create a ticket?
      • We should establish protocols for high priority issue
      • Who should we CC when sending tickets to devops
      • Who do we get thumbs up from?
  • What is our process for the hotfix?
    • There's a lack a clarity about what a hotfix needs in order to get out
      • Thumbs up? from whom? etc.

Action items

  • Adam Palay (Deactivated) improve documentation by providing high level overview and suggestions for edge cases
  • Wajeeha Khalid provide ways of improving documentation
  • Escalation should hold meeting on how to do hotfix
  • Come up with protocols for how to communicate with different teams in different scenarios