jm + game-days   3

Game Day Exercises at Stripe: Learning from `kill -9`
We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node, and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.


Excellent post. Game days are a great idea. Also: massive Redis clustering fail
game-days  redis  testing  stripe  outages  ops  kill-9  failover 
october 2014 by jm
Chef Testing at PagerDuty
Good article on how PagerDuty test their chef changes -- lint, unit tests using ChefSpec, integ tests and their "Failure Friday" game days
testing  chef  ops  devops  chefspec  game-days  pagerduty 
december 2013 by jm
Failure Friday: How We Ensure PagerDuty is Always Reliable
Basically, they run the kind of exercise which Jesse Robbins invented at Amazon -- "Game Days". Scarily, they do these on a Friday -- living dangerously!
game-days  testing  failure  devops  chaos-monkey  ops  exercises 
november 2013 by jm

Copy this bookmark:



description:


tags: