Netflix staggers

On Christmas Eve, Netflix went down. Here's how it happened and what Netflix needs to do to try to stop it from happening again.

Twas the night before Christmas and all through the house, people were screaming and it wasn't at a mouse. The in-laws were staring at the TV with care, in hopes that Netflix soon would be returned there. Alas, it wasn't to be. Netflix wouldn't be playing again until December 25th.

So what happened? It turns out it was still another Amazon Web Services outage. This one was centered at the Amazon Web Services (AWS) Elastic Load Balancers (ELB), in the US-East-Region1 data-center in Northern Virginia. Other cloud services went down as well, but let's face it, on Christmas eve no one noticed much of the others. Curiously though, Amazon Instant Video kept going

To be exact, trouble developed with the Elastic Load Balancers' application programming interfaces (API)s. New load balancers would go up but they wouldn't properly report their status to the overall ELB service. This, as Netflix users quickly found out, led to "significant levels of traffic loss." 

It was only around Christmas noon Eastern time that Netflix was able to report that "We're back to normal streaming levels."

We still don't why the ELBs went out of whack but I have an educated guess. Netflix is easily the most popular video service, but the ELB management service wasn't able to handled the increased demand of families for Christmas movies on the 24th and one bad thing quickly lead to another. 

What should have happened was for the ELBs to automatically reroute traffic to another availability zone (AZ). From here, I can't tell if the Netflix ELBs  hadn't been configured properly to do that. That seems unlikely to me given how much of a load Netflix puts on its ELBs even on a slow day,  Instead, it seems more likely that the management service above the ELBs really wasn't getting an accurate view of what was going on with the newly launched ELBs. 

This isn't the first time that an AWS crash nailed Netflix.  It's not just Netflix of course. AWS has been crashing a lot recently. In October, an Amazon Elastic Block Storage (EBS) crash brought down numerous sites across the Internet such as Reddit and Imgur. I like AWS as much as the next guy looking for inexpensive cloud services, but this is happening way too often. 

To meet our almost endless demand for movies, Netflix has set up its own content delivery network (CDN). What Netflix hasn't done, though, is to set up its own cloud service. After this Christmas Eve fiasco, they might want to consider it. 

It wouldn't be cheap, but considering how ticked off some Netflix customers were that night, it might be worth it. 

Related Stories:

  • Netflix goes to the edge of the Internet
  • Netflix and Disney: The good news and the bad news
  • A Billion hours of Netflix and nothing to watch

Copyright © 2012 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022