How long will big-name customers like Netflix put up with Amazon cloud outages?

Amazon's cloud went down, again, this time on Christmas Eve, for 12 hours , blacking out 7% of AWS customers. Will continued cloud outages erode confidence in the public cloud?

Credit: Shutterstock

On Christmas Eve as Netflix customers cuddled up to watch their favorite holiday movies and TV shows with friends and family, many experienced a problem. Netflix was down.

More precisely, Amazon Web Service’s public cloud, which Netflix relies on to stream content to customers, experienced an outage in its U.S.-East region, the same spot that has been plagued by some of the company’s other biggest blunders of its public cloud services.

Netflix is a poster-child customer for using AWS services at large scale. At Amazon’s first-ever user conference in November, Netflix CEO Reed Hastings participated in a keynote Q&A with AWS CTO Werner Vogels, reiterating the value AWS provides to the company. Netflix cloud guru Adrian Cockcroft gave speeches to standing-room only conference rooms at the show giving advice to customers on how to architect AWS clouds for fault tolerance and high availability.

While Cockcroft and Vogels have repeated many times that outages are inevitable, the timing of this most recent Christmas Eve crisis at Netflix and Amazon has some asking the question: How long will Netflix and other big-name customers put up with Amazon cloud outages?

In a post-mortem report, Amazon says an employee accidentally deleted information that controls Elastic Load Balancers (ELB) in its cloud around 12:30 PT on Dec. 24. The maintenance process was thought to be running on a test environment, but in fact it was on production workloads. The deleted data did not allow new ELB configurations to be created, which allow customers to spread workloads across multiple virtual machines. A first attempt to fix the problem by replacing the deleted data failed, and the successful replacement of the data did not occur until 5:40 a.m. Christmas morning. By 10:30 a.m. almost all issues had been restored, but not before AWS estimates that 6.8% of the company’s running ELBs were impacted.

Netflix says the timing of the event was actually a good thing. In a blog post describing the incident Cockcroft says Christmas Eve is traditionally a slow time for Netflix compared to Christmas Day. Select customers who access Netflix streaming to their televisions from gaming consoles and mobile devices had unavailable or spotty service for seven hours. Netflix is designed, Cockcroft says, for the failure of a single Availability Zone within AWS’s cloud, but not for a service that spans multiple Availability Zones and an entire region to go down, which is what happened with the ELBs. Netflix engineers are working on creating regional resiliency to prepare to the next outage, he says.

Jillian Mirandi, an analyst at Technology Business Research, Inc. says continued outages from Amazon could cause some customers to begin looking elsewhere beyond Amazon for their cloud needs. The problem is, many of the other providers in the market just can’t offer comparable pricing and breadth of services. “This is definitely a challenge for them,” Mirandi says of AWS. “I don’t think a ton of customers are just getting up and leaving, likely because Amazon is still one of the best options out there.” But if outages continue to happen or if they happen more frequently, it can become an increased problem, she says. As competitors – Rackspace, Google, Microsoft, Terremark and others of the industry – continue to improve their services, AWS may not always have the price and service breadth advantage over all other competitors, she says.

Amazon customers are certainly already taking notice. Jeremy Jongsma is director of product development at Barchart, a provider of stock market trading information, and an Amazon cloud customer. Barchart wasn’t hit by the most recent outage, but his company did experience some disruptions from an outage earlier this year. That led to him spreading his company’s application across two Availability Zones instead of just one, as AWS recommends customers do. The Christmas Eve outage is one of the more concerning ones Jongsma says though, because it sounds like it could have been wholly preventable. “The root cause of human error is the most worrying,” he says.

Krishnan Subramanian, an analyst at boutique firm Rishidot Research, says he doesn’t believe continued outages will erode confidence in AWS and the public cloud, instead he sees it as an educational moment for customers to explore better cloud resiliency methods, such as spreading workloads across clouds from multiple providers. “Clearly, AWS has a problem to fix but I would also expect the cloud users like Netflix to learn from the past experience and not keep all eggs in the same basket,” with a single provider, he says.

Others believe that outages are just a fact of life in IT, and especially the cloud. David Linthicum, CTO of cloud consultancy Blue Mountain Labs, points out that public cloud providers, like AWS, likely have much more resilient systems compared to in-house IT shops. The key is preparing for the outages. “I’m sure there will be many more outages this year, and next year. You just need to build those types of events into your cloud service usage and operations planning. I just don’t see the point of ‘handwringing’ over each of these outages.”

Network World staff writer Brandon Butler covers cloud computing and social collaboration. He can be reached at BButler@nww.com and found on Twitter at @BButlerNWW.

Cloud ComputingAmazon Web Services

How long will big-name customers like Netflix put up with Amazon cloud outages?

Amazon's cloud went down, again, this time on Christmas Eve, for 12 hours , blacking out 7% of AWS customers. Will continued cloud outages erode confidence in the public cloud?

More from this author

What is fog computing? Connecting the cloud to things

How Cisco’s newest security tool can detect malware in encrypted traffic

SD-WAN deployment options: DIY vs. cloud managed

Inside Cisco’s DNA Center – the dashboard for intent-based networking

VMware targets cloud and container networking with latest NSX-T launch

Public – not hybrid – cloud dominates day 1 at Amazon re:Invent

Why 2018 will be the year of the WAN

5 predictions for the hybrid cloud market in 2018

Show me more

Groundcover raises $100M as observability pivots from monitoring to AI infrastructure

Dangling DNS records and reverse DNS gaps give attackers new openings

Data center developer eyes disused newpaper printing plant

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Master Linux Math with the bc Command | Easy CLI Calculations Explained!

Master Linux Math in Seconds: How to Use the expr Command Like a Pro

How to Do Math in the Command Line Using Double Parentheses

How long will big-name customers like Netflix put up with Amazon cloud outages?

From our editors straight to your inbox

More from this author

What is fog computing? Connecting the cloud to things

How Cisco’s newest security tool can detect malware in encrypted traffic

SD-WAN deployment options: DIY vs. cloud managed

Inside Cisco’s DNA Center – the dashboard for intent-based networking

VMware targets cloud and container networking with latest NSX-T launch

Public – not hybrid – cloud dominates day 1 at Amazon re:Invent

Why 2018 will be the year of the WAN

5 predictions for the hybrid cloud market in 2018

Show me more

Groundcover raises $100M as observability pivots from monitoring to AI infrastructure

Dangling DNS records and reverse DNS gaps give attackers new openings

Data center developer eyes disused newpaper printing plant

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Master Linux Math with the bc Command | Easy CLI Calculations Explained!

Master Linux Math in Seconds: How to Use the expr Command Like a Pro

How to Do Math in the Command Line Using Double Parentheses