Why Amazon's recent outage was a networking issue, and how to prevent it

BGP route leaks again to blame.

cloud

The brief outage AWS suffered late last month exposed the dependence of countless enterprise applications on a few cloud components. Nick Kephart published a very good analysis here that explains how a route leak from data center provider Axcelx was responsible for this incident. This is not a new problem. Route leaks caused by incorrect BGP advertisement have resulted in much larger outages.

For example, YouTube was taken offline in 2008 when a relatively small carrier, Pakistan Telecommunication Company (PTCL), started sourcing the YouTube prefix. In response, direct peers started preferring the PTCL-originated prefix to the original YouTube advertisement. This problem also caused the Amazon outage, when an outsider started advertising address space belonging to Amazon, black holing traffic as a consequence.

This problem is tied to BGP – the control protocol that powers the Internet and enables the exchange of information between routers belonging to various service providers. BGP does not currently verify the originator of a route, or whether the originator is an authorized one. BGP also doesn’t verify that the advertised AS_PATH represents a valid path to the originator. This shortcoming is addressed by an emerging standard, called Secure Origin BGP (soBGP), but until it is implemented these types of incidents will continue to plague the internet.

In the meantime, there are some measures that carriers and cloud providers can take to prevent these incidents. To put these in context, it helps to understand what type of information is being advertised by BGP. Three different types of IP address prefixes are typically used in carrier networks (depending on how they are designed, of course):

  • Prefix type-1, Infrastructure: This IP-space is used to provide addressing to the internal infrastructure of the service provider; routers, links, servers and other internal elements. These resources typically only need to be accessed by the local service provider, so the assigned space is typically not advertised to outside parties. It may even be addressed using private (RFC1918) space.
  • Prefix type-2, Customer connectivity: Links between the carrier network and customers must also have addresses. This space may be advertised outside of the carrier network.
  • Prefix type-3, Customer and Service Space: Any customer-assigned IP-space or addressing belonging to public services provided by the carrier. This space must always be advertised for external Internet connectivity to function.

Therefore, why don’t carriers simply block advertisements of non-expected IP-space from lower-tier providers, essentially blocking them from providing transit?

One of the reasons carriers don’t establish BGP policies that block unexpected routes from other carriers is multi-homing. Essentially, when a provider assigns space to a customer out of their own CIDR-block, they advertise that route into BGP sourced from their own Autonomous System (AS). When one of their customers decides to multi-home, the customer-specific portion of the provider block also gets advertised through a different carrier. A policy designed to block such advertisements is unsustainable and would effectively disable multi-homing.

Nevertheless, a few simple policy changes could help avoid AWS-style outages in the future.

  1. Tier-1 carriers should start blocking each other’s Type-2 blocks from being exchanged through smaller transit providers. For example, it’s common for Carrier1 and Carrier2 to peer with each other at multiple locations in addition to meet-me exchanges such as Telx or Equinix. This creates significant built-in redundancy between the two carriers. Given this fact, there’s really no need for them to accept each other’s advertisements from a smaller tier-2 carrier.
  1. The same action needs to be taken by cloud providers. Typically, cloud providers are not carriers, or at least the cloud operation is independent from the carrier operation from an IP-space point of view. Therefore, there is no need for different carriers to source portions of a cloud provider’s block since they (the cloud provider) handle their own multi-homing. In the U.S., a few major providers handle more than 95% of all Internet traffic. Meanwhile, most cloud traffic is generated by no more than five to seven larger cloud providers. Hence, if every Tier-1 carrier and the major cloud provider blocked address-blocks that are not Type-3 (customer address space) received from small and regional providers, the issue that hit Amazon could easily be avoided. Since most large carriers use RPSL to implement policies, these can typically be converted into router configurations.
  1. Cloud providers and infrastructure providers should collaborate with enterprise customers to ensure that address space used for cloud services is only leaked into the carrier(s) networks used by that enterprise. This would entirely contain the risk of black holing of traffic from inadvertent or illicit advertisement by third parties. It would also prevent DDoS-attacks against the specific cloud resources being used by the enterprise.
  1. To complement the aforementioned BGP policies, the same could be done for the data plane. Unicast RFP could be applied towards all the peers for any known directly connected carrier space. This way, DDoS attacks could be avoided from spoofed source addresses belonging to the address space of any large carrier.

Despite the fact that BGP is susceptible to route leaks that can lead to large-scale internet outages, there are steps we can take to contain these incidents until the problem is fixed with soBGP or some other mechanism.

This article is published as part of the IDG Contributor Network. Want to Join?

To comment on this article and other Network World content, visit our Facebook page or our Twitter stream.
Related:
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.