Skip Links

Amazon EC2 outage calls 'availability zones' into question

When multiple zones fail, Amazon's cloud redundancy isn't redundant enough

By , Network World
April 21, 2011 02:44 PM ET

Network World - For cloud customers willing to pony up a little extra cash, Amazon has an enticing proposition: Spread your application across multiple availability zones for a near-guarantee that it won't suffer from downtime.

"By launching instances in separate Availability Zones, you can protect your applications from failure of a single location," Amazon says in pitching its Elastic Compute Cloud service.

FAQ: Cloud computing, demystified

Customers who build applications in just one availability zone are more likely to suffer outages. But what happens when multiple availability zones go dark at the same time? We found out today when an outage forced websites such as Foursquare, Reddit, Quora and Hootsuite offline.

"We can confirm connectivity errors impacting EC2 instances and increased latencies impacting EBS (Elastic Block Storage) volumes in multiple availability zones in the US-EAST-1 region," Amazon said Thursday on its service health dashboard.

The US-EAST-1 region, based in northern Virginia, is one of several Amazon regions around the world. There's another one in northern California. Amazon started reporting troubles at 4:41 a.m. Eastern time. By 1:26 p.m., Amazon said it is "now seeing significantly reduced failures and latencies," but that problems were still ongoing. Amazon blamed a "networking event" that "triggered a large amount of re-mirroring" of storage volume, creating a capacity shortage.

Each region contains multiple availability zones -- but little information about each one is known, according to Gartner analyst Drue Reeves. There are four availability zones within the Virginia region, Reeves says. But are they in different data centers? How far apart are they? How is data replicated across zones? Reeves says Amazon hasn't been transparent about these questions. Not knowing the answers makes it difficult for customers to know which methods of building high availability into applications will be most effective.

"Amazon has said for years that they run multiple availability zones within a region to prevent the outage of an entire region," Reeves said. "But yet here we are, and we have an outage inside EC2 for an entire region."

An Amazon spokesperson hasn't yet responded to a request for comment.

Perhaps tellingly, Amazon's service-level commitment provides 99.95% availability for each region -- but not for each availability zone. This is good enough for many customers but well below the "five nines" standard of high availability.

Our Commenting Policies
Cloud computing disrupts the vendor landscape


Latest News
rssRss Feed
View more Latest News