VMware Cloud Foundry Experiences Its Own Cloud Outage
May 4, 2011
Welcome to the cloud outage club, VMware. The company’s fledgling Cloud Foundry PaaS offering, which had only begun to position itself as an alternative to Microsoft Windows Azure dominance, is back online after a short outage. The outage itself lasted from April 24-25, but VMware has only begun to explain what happened. Why are some big-name public cloud platforms experiencing outages?
According to a blog post by the Cloud Foundry team, the platform started experiencing issues at 5:45 AM PST on April 25th — the same day that TalkinCloud reported that Amazon EC2 wasn’t back up to full capacity after its own outage. When their Cloud Controllers went down, “all developer facing control operations (login, logoff, create app, start app, stop app, etc.) were no longer possible.” By 3:30pm PST, they had identified the faulty power supply responsible for the problems and restored everything to 100% service health.
The next day, VMware says, the Cloud Foundry engineers met to come up with an actionable plan to keep it from happening again. They say that the exercise was meant to be paper-only, with no one actually implementing anything or even touching the keyboard while they came up with their guidelines. What follows is my favorite excerpt from a cloud maintenance report ever (emphasis mine):
Unfortunately, at 10:15am PDT, one of the operations engineers developing the playbook touched the keyboard. This resulted in a full outage of the network infrastructure sitting in front of Cloud Foundry. This took out all load balancers, routers, and firewalls; caused a partial outage of portions of our internal DNS infrastructure; and resulted in a complete external loss of connectivity to Cloud Foundry.
The back-end infrastructure continued to run, according to VMware. But for all intents and purposes, Cloud Foundry was dead to the world. It was back up and running before midnight on the 26th, and VMware has issued profuse apologies.
The full blog post is well worth a read whether or not you’re a Cloud Foundry customer. And while it’s odd that it’s taken a week since the platform came back up for this blog entry to disseminate, it’s still more communication than Amazon Web Services has shown its cloud customers.
Follow Talkin’ Cloud via RSS, Facebook and Twitter. Sign up for Talkin’ Cloud’s Weekly Newsletter, Webcasts and Resource Center. Read our editorial disclosures here.
About the Author
You May Also Like