Copy this bookmark:



bookmark detail

January 28th Incident Report · GitHub
"Our early response to the event was complicated by the fact that many of our ChatOps systems were on servers that had rebooted. We do have redundancy built into our ChatOps systems, but this failure still caused some amount of confusion and delay at the very beginning of our response. "

"We had inadvertently added a hard dependency on our Redis cluster being available within the boot path of our application code."
february 2016 by peakscale
view in context