Enterprise WAN Service History (part 2)

Insufficient reliability of Internet, broadband allow continued MPLS dominance despite huge price gap.

An earlier column described a brief history of Enterprise WAN over the last roughly 25 to 30 years. Here, I want to go into a bit more detail on that history, as a setup for columns the rest of the summer on the factors that affect application performance over the WAN, and the consequences of WAN service history on the evolution of WAN technologies.

For those who didn't read that earlier history, or just want a synopsis, it goes something like this: Point-to-point private enterprise WANs gave way to cloud WANs: Frame Relay in the 1990s, and its successor Multiprotocol Label Switching (MPLS) over the last several years. The public Internet also enters the picture in the 1990s, but despite IPSec VPNs solving the data security issues, and the tremendous price/bit that Internet connections offer, especially broadband and now colo-based connections in the last few years, expensive MPLS has remained the private WAN service of choice for almost all large enterprises.

RELATED: A Brief History of the Enterprise WAN

Why does MPLS cost so much more than Internet connectivity?

To be sure, enterprises do make multiple uses of the Internet today. Almost all use it for access to the corporate networks by users at home or on the road, typically via remote access SSL VPNs. Many use it for backup WAN connections via IPSec VPNs for their primary (and generally highly reliable, at least domestically) MPLS WAN. Most also use it to interact with their customers via corporate websites, although almost all of these services are hosted off the enterprise premise at a colocation facility somewhere. And, of course, all but perhaps the most hyper-security conscious government organizations use it for general Internet web surfing, for both business uses and personal use by their employees at work. Some use Software-as-a-Service (SaaS) applications like Salesforce for a portion of their critical business applications, even as few are comfortable using it for their most mission-critical apps, or those like voice and videoconferencing that are particularly sensitive to network quality.

The primary reason that enterprises have not switched to primary use of the Internet for their site-to-site intranet WAN is that unaided it is simply not reliable enough. The popular perception that the Internet: "works pretty well most of the time" is reasonably accurate. The problem with this is that "works pretty well" is not good enough for most enterprise WAN managers, and "most of the time" is not good enough for almost any.

Internet connections are about "two nines" (99%) reliable. When I refer to reliability here, I mean not simply availability – i.e. is the connection up or down – but also whether packets sent are delivered successfully without being lost or excessively delayed. Two nines reliability means that connectivity will be poor – or nonexistent – about 3.6 days per year. By contrast, the three and a half (99.95%) to four nines (99.99%) reliability generally promised and delivered by MPLS means an outage of about 45 minutes a year on average. And domestic MPLS connections almost always avoid congestion-based packet loss due to interference by flows other than one's own, and usually have fairly low jitter. They also offer (usually at extra cost) QoS mechanisms to help ensure that real-time applications like VoIP and videoconferencing are usually not impacted by jitter and loss events.

Since no providers guarantee end-to-end connectivity over the public Internet, and until recently they had no real way to do so on their own, enterprise WAN managers were quite rational to remain with private WAN services like MPLS from service providers such as AT&T and Verizon, who have the necessary footprint to provide coverage nationwide and (sometimes via partners) global connectivity to all of an enterprise's locations, despite the fact that the price/bit gap is now an enormous 30 to 100 times.

Beyond the fact that most Internet connections don't have the same SLAs, especially in terms of MTTR (Mean Time To Repair) – many MPLS vendors will promise four-hour response times to failed links, where 48 hours is more typical for Internet connections, especially broadband connections – why is Internet connectivity less reliable than a private MPLS service?

The Internet is, of course, a network of networks. Per above, no one guarantees end-to-end connectivity across the Internet. There is also no QoS available, for a number of reasons, not the least of which involves the difficulty of accounting and billing across multiple carriers.

That end-to-end problem noted, the problems with packet delivery across the Internet are almost never at the Internet core. The core, in fact, is four nines reliable, and given the increasing importance of Internet commerce, as well as the significant number of Internet providers and competitors, there is little reason to think that this will change. Internet quality problems, then, usually result from either last-mile link connectivity or else at the peering points, which are the places where Internet Service Providers (ISPs) connect with each other.

The last-mile link issues are fairly straightforward. If you purchase T1/T3, metro Ethernet or fiber connections for your Internet connectivity, it is likely that you will have the same last-mile connectivity and SLA you will get for an MPLS link. Broadband links – DSL, cable, wireless – on the other hand, usually offer lesser SLAs, inferior MTBF (Mean Time Between Failures) and higher MTTR, and often higher oversubscription of backbone connections than TDM or fiber links. For DSL and wireless connections, there is the additional problem of often insufficient upstream bandwidth on the link (cable connections, by contrast, often exceed the upstream bandwidth available on a T1/E1).

These, on their own, are excellent reasons enterprise WAN managers are reluctant to depend on a single broadband link for their site-to-site connectivity. Because most remote locations do not have IT personnel on site, or just because of the loss of business productivity from the absence of WAN connectivity, it's worth "overpaying" for MPLS connections to ensure reliable connectivity.

Even when the last mile link isn't an issue, those peering points that connect different ISPs' networks together are a significant source of packet loss and delay. Service providers can and do engineer their internal networks to ensure virtually no packet loss under all but the most unusual conditions. This is called "on-net" performance. "Off-net", by contrast, involves handing off packets between two or more ISPs before the packets reach their ultimate destination.

Since no carrier can guarantee off-net performance, the problem of peering-point congestion is not technical, but rather economic. Since no carrier gets paid more for four nines performance at the peering point, and since they couldn't guarantee that all the other peering point connections along the way will do so as well, they don't "overengineer" their peering point connections to ensure such performance. The cost of providing four nines versus two nines at all peering points is an order of magnitude higher, given the exponential decrease in packet drops required to go from two nines to four nines. Now, conversely, the Internet really does work "pretty well" (i.e. about two nines, and these days usually a bit better than that) because if ISPs delivered worse performance than two nines, their support costs would go up, and they'd risk losing business to competitors.

It is this last reason, the fundamental point of a network-of-networks, which is why it will be a very long time, if ever, that generic public Internet reliability and packet delivery performance will be as high as private WANs.

Because the relatively small difference in the reliability of packet delivery has such a large impact on the enterprise WAN and the enterprise WAN manager, the gap between the price of Internet bandwidth and MPLS bandwidth has ballooned to 30x - 100x price/bit. This has caused two fundamentally different (and complementary) technological responses, each with their own set of advantages, from the WAN Optimization and WAN Virtualization vendors. In the weeks ahead, we'll look more closely at the factors that affect applications performance on the WAN, and the consequences of this WAN service history.

A twenty-five year data networking veteran, Andy founded Talari Networks, a pioneer in WAN Virtualization technology, and served as its first CEO. Andy is the author of an upcoming book on Next-generation Enterprise WANs.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2012 IDG Communications, Inc.

IT Salary Survey 2021: The results are in