jm + azure + flighting   1

Update on Azure Storage Service Interruption
As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services. Prior to applying the performance update, it had been tested over several weeks in a subset of our customer-facing storage service for Azure Tables. We typically call this “flighting,” as we work to identify issues before we broadly deploy any updates. The flighting test demonstrated a notable performance improvement and we proceeded to deploy the update across the storage service. During the rollout we discovered an issue that resulted in storage blob front ends going into an infinite loop, which had gone undetected during flighting. The net result was an inability for the front ends to take on further traffic, which in turn caused other services built on top to experience issues.


I'm really surprised MS deployment procedures allow a change to be rolled out globally across multiple regions on a single day. I suspect they soon won't.
change-management  cm  microsoft  outages  postmortems  azure  deployment  multi-region  flighting  azure-storage 
november 2014 by jm

Copy this bookmark:



description:


tags: