peakscale + postmortem + paas   19

Investigating Google App Engine issue starting on Sept 23, 2013 - Google Groups
"During planned maintenance in one of App Engine’s US HRD datacenters, a low-level naming infrastructure service unexpectedly began returning malformed responses. A bug in the name resolution library, which is compiled into App Engine’s servers, caused our processes to crash upon receiving the malformed responses."
postmortem  google  paas 
september 2013 by peakscale
Information on Google App Engine's recent US datacenter relocations - Google Groups
"We are chagrined to say that, since this event, we have been unable to further diagnose the origin of this issue, nor reproduce it in our testing infrastructure or labs."
postmortem  google  paas 
september 2013 by peakscale
Google App Engine deployments failing for some users on August 16, 2013 - Google Groups
(No information) "Google App Engine deployments starting at 5 AM US/Pacific time and continuing until approximately 2:30 PM US/Pacific time causing some applications to have deployment failures. The problem is now resolved"
postmortem  google  paas 
august 2013 by peakscale
An update on App Engine URL Fetch problems on June 11, 2013 - Google Groups
"A recent audit of Google network traffic revealed an error in Google’s network routing configuration. The error in the routing configuration was corrected immediately upon discovery. Unfortunately, the correction to the routing configuration did not properly take into account a special network testing configuration in one of Google’s datacenters, and impeded that datacenter’s ability to respond to requests originating from App Engine applications."
postmortem  networks  google  paas 
august 2013 by peakscale
Post-mortem for February 24th, 2010 outage - Google Groups
"On February 24th, 2010, all Googe App Engine applications were in varying degraded states of operation for a period of two hours and twenty minutes [...]. The underlying cause of the outage was a power failure in our primary datacenter. "
postmortem  google  paas 
august 2013 by peakscale
Windows Azure Service Disruption Update - Windows Azure - Site Home - MSDN Blogs
"The issue was quickly triaged and it was determined to be caused by a software bug. While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year. "
postmortem  paas 
january 2013 by peakscale
December 28th, 2012 Windows Azure Storage Disruption in US South - Windows Azure
"There were three issues that when combined led to the disruption of service."
postmortem  paas 
january 2013 by peakscale
About today's App Engine outage
" The global restart plus additional load unexpectedly reduces the count of healthy traffic routers below the minimum required for reliable operation. This causes overload in the remaining traffic routers, spreading to all App Engine datacenters. Applications begin consistently experiencing elevated error rates and latencies."
postmortem  google  paas  networks 
january 2013 by peakscale
Analysis of April 25 and 26, 2011 Downtime : CloudFoundry.com Support
"DEA nodes had stabilized but the Cloud Controllers (all 8) had lost all connectivity to portions of the storage subsystem. As we indicated on support.cloudfoundry.com, this event caused the Cloud Controller and Health Manager to enter into a read-only mode."
postmortem  paas 
january 2013 by peakscale
Summary of Windows Azure Service Disruption on Feb 29th, 2012 - Windows Azure - Site Home - MSDN Blogs
"While the trigger for this incident was a specific software bug, Windows Azure consists of many components and there were other interactions with normal operations that complicated this disruption. There were two phases to this incident. The first phase was focused on the detection, response and fix of the initial software bug. The second phase was focused on the handful of clusters that were impacted due to unanticipated interactions with our normal servicing operations that were underway. "
postmortem  paas 
january 2013 by peakscale
App Engine postmortem for August 18, 2011 outage
" a Google data center in the American Midwest, which was serving App Engine Master/Slave Datastore applications on that date, lost utility power as a result of an intense thunderstorm"

"The architecture of the Master/Slave Datastore for App Engine makes no substantial improvement in this situation possible."
postmortem  google  paas 
january 2013 by peakscale
App Engine: Java App Engine outage, July 14, 2011
"During development, testing, and qualification, this bug was essentially hidden from view because it only manifested itself under specific load patterns"
postmortem  google  paas 
january 2013 by peakscale
App Engine: Information regarding 2 July 2009 outage
"Typically, we would have switched to an alternate datacenter immediately. However, due to the specific nature of this problem, switching datacenters immediately meant that the most recent data written by applications would not have been available, leading to consistency problems for many applications. The team decided to try to stabilize GFS first, then switch datacenters. This was accomplished and we avoided any data consistency issues."
postmortem  google  paas 
january 2013 by peakscale
App Engine postmortem for February 24th, 2010 outage
"underlying cause of the outage was a power failure in our primary datacenter. While the Google App Engine infrastructure is designed to quickly recover from these sort of failures, this type of rare problem, combined with internal procedural issues extended the time required to restore the service."

"a bad call of returning to a partially working datacenter"
postmortem  google  paas 
january 2013 by peakscale
Heroku Status: Widespread Application Outage
"Starting last Thursday, Heroku suffered the worst outage in the nearly four years we've been operating. Large production apps using our dedicated database service may have experienced up to 16 hours of operational downtime. "
aws  postmortem  paas 
october 2012 by peakscale
AppEngine: Information regarding 2 July 2009 outage
"The App Engine outage was due to complete unavailability of the datacenter's persistence layer, GFS, for approximately three hours. The GFS failure was abrupt for reasons described below"
postmortem  paas  google 
october 2012 by peakscale

Copy this bookmark:



description:


tags: