jm + disaster-recovery   2

OVH suffer 24-hour outage (The Register)
Choice quotes:

‘At 6:48pm, Thursday, June 29, in Room 3 of the P19 datacenter, due to a crack on a soft plastic pipe in our water-cooling system, a coolant leak causes fluid to enter the system';
‘This process had been tested in principle but not at a 50,000-website scale’
postmortems  ovh  outages  liquid-cooling  datacenters  dr  disaster-recovery  ops 
4 weeks ago by jm
Weathering the Unexpected - ACM Queue
Failures happen, and resilience drills help organizations prepare for them.


Good write-up on Google's DiRT (Disaster Recovery Test) procedures, clearly based on Amazon's Gameday exercises. ;) See also http://queue.acm.org/detail.cfm?id=2371297 for a moderated discussion including Jesse Robbins and John Allspaw
game-day  tests  disaster-recovery  dirt  exercises  history  amazon  google  etsy  resilience  acm 
september 2012 by jm

Copy this bookmark:



description:


tags: