If you're a critic of cloud computing looking for ammunition, then Amazon gave you a nice gift this week with a power outage that took some Amazon Elastic Compute Cloud instances offline for several hours.
Cloud vendors often boast of redundancy in their systems that should prevent any type of downtime, but the Amazon incident and several others show that cloud services are vulnerable to outages just as regular enterprise data centers are.
The Amazon outage, caused by a power failure, occurred in its Northern Virginia data center in the early A.M. on Wednesday. Apparent Networks, an IT performance management company, issued an advisory saying the outage began at 3:34 A.M. eastern time and lasted 44 minutes. But Amazon's own Service Health Dashboard, which tracks uptime status of its cloud services, reported that "a very small number of instances" were still not responding at 9:41 A.M. eastern time, six hours later.
Amazon provided an update on the power failure on Thursday, saying "We would like to provide further information on the issue experienced on December 9 in one of our east coast availability zones. A single component of the redundant power distribution system failed in this zone. Prior to completing the repair of this unit, a second component, used to assure redundant power paths, failed as well, resulting in a portion of the servers in that availability zone losing power. Impacted customers experienced a loss of connectivity to their instances. As soon as the defective power distribution units were bypassed, servers restarted and instances began to come online shortly thereafter."
Amazon's "availability zones" are isolated from each other and thus one availability zone is immune to failures in other availability zones. Customers can choose to run applications across multiple zones - doing so presumably would prevent downtime during events such as Wednesday's power failure.
"By launching instances in separate Availability Zones, you can protect your applications from failure of a single location," Amazon notes in an FAQ on its Elastic Compute Cloud service.
Amazon previously suffered from downtime in June and July, notes reporter Rich Miller in Data Center Knowledge.
Rackspace has also suffered outages affecting cloud customers in June, July and November.
Given the large numbers of customers accessing cloud services, these outages may be relatively minor in the grand scheme of things. However, both Amazon and Rackspace will need to avoid downtime if they are to convince enterprise customers that cloud computing platforms are ready to host mission critical applications.
Jon Brodkin writes about Microsoft, Google, browsers, operating systems, PCs, mobile devices, cloud computing, virtualization, open source and a bunch of other tech stuff for Network World. He also cares just a little bit too much about Boston sports teams. Follow Jon on Twitter @jbrodkin.
Policy on comments: Respectful discussion is welcomed! However, comments that use inappropriate language, consist of name calling or personal attacks, or include accusations of wrongdoing are not appropriate. Those comments will be deleted or edited.