Skip Links

Startup claims it saw early signs of Amazon's cloud outage

Noticed abnormal activity among customers two hours before Amazon announced outage

By , Network World
October 26, 2012 11:19 AM ET

Network World - Almost two hours before Amazon Web Services publicly acknowledged an outage that brought down websites such as Reddit, Imgur and Heroku, application monitoring startup Boundary claims it started noticing the problem.

The company is developing an early-warning system for cloud outages and following AWS's most recent incident, it says its service was proven to work at scale for the first time.

BACKGROUND: Amazon outage started small, snowballed into 12-hour event 

AWS ALTERNATIVES: 10 Most powerful IaaS companies 

Boundary is an application performance management (APM) tool that installs an agent that monitors second-by-second performance of virtual machines running in a public cloud or data center. The information from the VMs is sent into Boundary's cloud where it is analyzed. Boundary relays information to customers about the health of their system and aggregates data from its hundreds of customers to monitor trends in the cloud.

On Monday, almost two hours before AWS officially announced an issue in its Elastic Block Storage (EBS), which is a volume storage service used in conjunction with its Elastic Cloud Compute (EC2), Boundary started noticing abnormal activity in AWS's cloud, the company says.

During the next two hours, nearly a third of the agents among Boundary's more than 300 customers stopped reporting back to Boundary's cloud at one time or another. Data transfer from AWS's cloud to Boundary's data analysis servers dropped 27% from 38Mbps to 28Mbps.

Roundtrip latencies between the agents and Boundary's cloud increased by three times their normal levels, the company says. The latency in VM reporting continued until 2 a.m. PT on Tuesday, when AWS reported that the issue had mostly been resolved. Stamos detailed what the company found in a blog post.

Boundary only tracks the performance of the VMs, so she says there's no way to know what caused the issue on Monday. The decreased network traffic could mean customers were experiencing performance problems on their own instances, which were then being reflected in Boundary's tools, or that there was a problem the VMs' ability to send tracking data to Boundary from AWS. Either way, it was enough of an abnormal spike, Stamos says, that they knew something was up. "There's no way to go inside Amazon's infrastructure, what we're trying to do is be a leading indicator, alerting customers that there is a problem developing," she says.

Boundary hasn't quite fully developed that functionality yet though. In the most recent incident, Boundary didn't actually inform customers that AWS was experiencing some abnormal activity, it just reported the results on its website. The company hopes to in the future use this data to create that early warning system for users.

Having that knowledge could be critical, she argues. If customers were alerted to performance issues within an Availability Zone they could switch workloads out of it and into another unaffected Availability Zone, into a another cloud provider, or into their own data center.

Our Commenting Policies
Cloud computing disrupts the vendor landscape

 

Latest News
rssRss Feed
View more Latest News