Divide all dollar amounts from GA by 2 since we give partners roughly 50% of payments
This seems like a good opportunity to “teach people to fish” I’m thinking there could be value in the tiger team talking through the steps they took to identify the issues they brought up
+1 - How might we upskill other engineers in this type of discovery as they become self-owners of their services?
Feanil: One of the recommendations is to do exactly this. To setup repeatable performance related training similar to a11y or security.
For MTTR, how long do rollbacks take today? What do we anticipate this time to be after the investments suggested in the report?
Feanil: One of the recommendations in the list was around the fact that this information is not easy to collect right now and we should invest in gathering data on the full deploy timeline to make better investment decisions in the future.
Another thing noticed is that not everybody knows how to rollback, when to, whether they are empowered to. A recommendation is to address that.
Alex: Another thing we didn’t even glance at: disaster recovery drills - have we done these in the past, should we do them in the future?
Should we spend time improving observability and our use of monitoring tools?
Recommendation: Adding some specific alerts that would have been helpful.
Recommendation: Reviewing infrastructure alerts and making sure they are signalling on things that the team cares about.
Would be good if teams are have the knowledge and power to decide what should be monitored + alerted upon.
Are there recommendations around having a “pane of glass” (i.e., single dashboard for system monitoring)?
Recommendation: In NewRelic: Track when management commands are run, marketing pushes happen, course are opened, and other events that could impact performance
Would help us understand when “we did something that affected performance” or “someone else did something that affected performance”.