Amazon Web Services and Google Cloud Platform recorded impressive statistics for how reliable their public IaaS clouds were in 2014, with both providers approaching what some consider the Holy Grail of availability: five nines.
Flash back just to 2012 and pundits bemoaned the cloud being plagued with outages - from one that brought down Reddit and many other sites to the Christmas eve fiasco that impacted Netflix. It was a different story last year.
Website tracking firm CloudHarmony monitors how often more than four dozen cloud providers experience downtime. The company has a web server running in each of these vendors’ clouds and tracks when the service is unavailable, logging both the number and length of outages. The science is not perfect but it gives a good idea of how providers are doing. And overall, vendors are doing well and getting better.
+MORE AT NETWORK WORLD: Re-examining Cisco’s Intercloud strategy +
Amazon and Google shone in particular. Amazon’s Elastic Compute Cloud (EC2) recorded 2.41 hours of downtime across 20 outages in 2014, meaning it was up and running 99.9974% of the time. Given AWS’s scale - Gartner predicted last year that Amazon had a distributed system that’s five times larger than its competitors - those are impressive figures.
Perhaps even more eye-catching is the uptime of Google Cloud Platform’s storage service, which experienced 14 minutes of downtime in all of 2014, according to CloudHarmony. That’s good for a 99.9996 uptime percentage.
“The more established players are fine-tuning their systems and becoming quite stable,” says Jason Read, CEO of CloudHarmony (see the full data set here). AWS has been providing cloud services longer than anyone in the market and Google uses its existing infrastructure for its cloud, so it too has a long track record of managing a reliable distributed system.”
Cloud vendors did have their issues in 2014 too though. About 10% of AWS EC2 instances had to be rebooted after a Xen vulnerability was identified this past fall. Rackspace had a big fall reboot, too, and Microsoft had a storage service disruption in November. Verizon has kicked off 2015 by telling customers its cloud will be down for up to 48 hours this month for scheduled maintenance.
That Microsoft outage contributed to a difficult year for Azure in terms of availability. In the compute sphere, Azure experienced 92 outages totaling 39.77 hours. Its storage platform had 141 outages totaling 10.97 hours. By comparison, AWS’s storage platform had 23 outages and 2.69 hours of downtime. Microsoft did not offer a comment for this story.
Most providers seem to be improving their platforms, but could the cloud ever get to a point of offering up to carrier-grade five-nines of availability?
Donnie Berkholz, a senior analyst at consultancy Redmonk, says a deeper look at the CloudHarmony data shows that some providers are already achieving five-nines. Google is, for example, with its storage platform. Some of AWS’s regions (CloudHarmony monitors multiple ones in AWS’s cloud) had only a few minutes or zero downtime last year, and that’s also good for five-nines. With each passing year, and with increased scale, cloud providers get better at offering their services, it seems. But another interesting trend, Berkholz notes, is that users could be smartening up to outages.
Some cloud providers - Microsoft Azure and relative newcomer Digital Ocean for example - are popular despite not having fantastic uptime percentages in 2014 (Azure had 39 hours of downtime while CenturyLink had 26 hours and Digital Ocean had 16). “Outage frequency, within a certain range, isn't a blocker on adoption of an otherwise compelling cloud provider,” Berkholz wrote in an email. “The question [then] isn't which provider is best — but what is the limit of what customers find acceptable?”
There are plenty of ways users can prepare for a cloud outage - don’t host workloads in a single place, use tools to transfer traffic from dud servers, test system for fault tolerance frequently, etc. Perhaps users are heeding these best practices. Or maybe they’re not putting sensitive materials in the cloud that would be hurt by downtime. Cloud vendors and users both seem to be getting better at providing and using these services.