Google Apologizes for App Engine Outage
Google App Engine suffered a multi-hour outage on Oct. 26, despite assurances it would not go down. Google is trying to make things right by honoring SLAs.
October 30, 2012
‘Tis the season for cloud outages–or so it seems. Last week, Amazon Web Services (AWS) suffered a major outage in its US East-1 data center in North Virginia, and now Google (NASDAQ: GOOG) has humbly begged forgiveness for an outage in a service that wasn’t supposed to ever go down.
On Friday, Peter S. Magnusson, engineering director at Google App Engine, posted a note to the Google Enterprise Blog in an effort to explain and apologize for an outage on Google App Engine. Here are the details: On Friday, Oct. 26, at 4 a.m. Pacific, loads started to increase in one of the App Engine data centers. By 6:30 a.m., Google had to do a global restart of the traffic routers to address the problem load in the affected data center. It wasn’t until 11:45 a.m. that App Engine had returned to normal operation.
For those missing the math, that’s that’s nearly eight hours of problems on Google App Engine, although the real problems existed between 7:30 and 11:30 a.m., during which approximately 50 percent of requests to App Engine failed.
Magnusson did the right thing by admitting the error and apologizing to all of the developers that use App Engine to develop and manage their apps.
“We know you rely on App Engine to create applications that are easy to develop and manage without having to worry about downtime. App Engine is not supposed to go down, and our engineers work diligently to ensure that it doesn’t,” he wrote on Friday.
Thankfully, no application data was lost. Application behavior was restored without any manual intervention by Google’s developers. Magnusson also noted that developers didn’t need to make any configuration changes to their applications.
During the outage, developers using App Engine must have been pretty frustrated as they experienced increased latencies and time-out errors (I’m no developer, but time-out errors for anything get my ire up). This is a bit of an oddity for App Engine, though, as there has not been a systemwide outage since the launch of Google High Replication Datastore in January 2011.
Magnusson wrote that Google will be proactively issuing credits to all paid applications for 10 percent of their usage in October to cover SLA violations. Customers don’t need to take action. Google will hopefully take steps to ensure this doesn’t happen again, but kudos to the company for taking steps to appease its hundreds of thousands of developer customers.
About the Author
You May Also Like