jm + five-whys   3

Outages, PostMortems, and Human Error 101
Good basic pres from John Allspaw, covering the basics of tier-one tech incident response -- defining the 5 severity levels; root cause analysis techniques (to Five-Whys or not); and the importance of service metrics
devops  monitoring  ops  five-whys  allspaw  slides  etsy  codeascraft  incident-response  incidents  severity  root-cause  postmortems  outages  reliability  techops  tier-one-support 
april 2015 by jm
The Infinite Hows, instead of the Five Whys
John Allspaw with an interesting assertion that we need to ask "how", not "why" in five-whys postmortems:
“Why?” is the wrong question.

In order to learn (which should be the goal of any retrospective or post-hoc investigation) you want multiple and diverse perspectives. You get these by asking people for their own narratives. Effectively, you’re asking “how?“

Asking “why?” too easily gets you to an answer to the question “who?” (which in almost every case is irrelevant) or “takes you to the ‘mysterious’ incentives and motivations people bring into the workplace.”

Asking “how?” gets you to describe (at least some) of the conditions that allowed an event to take place, and provides rich operational data.
ops  five-whys  john-allspaw  questions  postmortems  analysis  root-causes 
november 2014 by jm

Copy this bookmark:



description:


tags: