MTBF, MTTR and SLAs, oh my

While MTBF and MTTR for broadband are much higher than MPLS, having multiple diverse links addresses the key issue of higher uptime; MPLS's "better" SLA doesn't actually deliver what enterprises care about.

Continuing our look into myths of why enterprises "need" MPLS. Today we'll consider two more:

Remember, given how much more MPLS costs relative to Internet bandwidth, the telecom service providers need people to believe many things about the superior quality of MPLS in order to maintain their market share and, more importantly, their extremely high margins.

Per Wikipedia, Mean time between failures (MTBF) is the predicted (average) time between failures of a system during operation. Mean time to repair (MTTR) represents the average time required to repair a failed component.

As with many of these myths related to MPLS, this one starts with a grain of truth. A few grains, actually. Back in the late 1990s when they were first introduced, there was little doubt that DSL and cable modem connectivity had far worse MTBF numbers than T1/E1-based MPLS, especially as consumer services. Fifteen years later, this gap has narrowed dramatically, especially when those services are offered to businesses. It may well be that MPLS continues to have an advantage here, but even if so, this would be very hard to prove.

More importantly, it does continue to be the case that most MPLS providers will commit to 2- to 4-hour SLAs to repair a problem connection for practically all locations, while MTTR commitments for business-class broadband typically ranges between 4 and 48 hours. So MPLS would appear to have an edge here. We'll talk more in just a bit about how firm that commitment is.

But even if it were an absolute guarantee, with just the simplest of high-availability techniques – namely, having multiple links at each location, from diverse vendors – the MTTR advantage matters little. Even without using multi-link aggregation or bonding techniques like  WAN Virtualization to enable sub-second failover between connections, but simply using one link as the backup to another, the downside of a 48-hour MTTR is pretty much eliminated.

As importantly, depending on exactly how you configure and deploy your network, you will experience a loss of connectivity for somewhere between a fraction of a second and at most 5 minutes, with a more likely upper limit of about 90 seconds. This is far better than 2 hours of downtime, of course. Broadband connections are so inexpensive that any MPLS customer who cares about high network availability will typically deploy a secondary Internet connection as backup already. Because of the ubiquity and interconnectedness of the Internet, wireless connections such as 4G/LTE, which become more available and better performing each year, are also an option, and virtually guarantee that low-cost diversity will be available. The bottom line here: using diverse Internet connections renders any MTBF and MTTR advantages of MPLS effectively moot.

Now, let's move on to our second "myth." In the context of data networking, Service Level Agreement (SLA) is a contracted commitment by a provider defining formally "what you get" in terms of things like uptime/availability and packet delivery. MTBF and MTTR figure prominently here, especially the latter, as well as packet loss rates and sometimes jitter.

While a formal SLA with better "guarantees" for MTBF, MTTR, packet loss and jitter sounds like a great thing, there are a couple of caveats. From a technical perspective, the loss rate and jitter guarantees pretty much always only cover the service provider's core network, not the end-to-end connection including your last mile. The SPs will say this is because they can't control how much traffic that you send onto your own last mile link –that you might overfill a link and cause packet loss or jitter yourself. While there is some truth to this, it's also means that you are not protected from any problems not of your own doing, which do occur on the last mile.

The bigger problem with these SLAs is that they don't have any real teeth. If the carrier violates the terms of the SLA, its biggest penalty is that it will owe you a portion of your monthly bill back. The more "generous" SLAs will say that if the outage lasts for too long a period of time, they'll refund your entire month's bill. The problem, of course, is that you don't want a free month's service – you want to avoid the very high cost of downtime to your enterprise. But no carrier will give you an SLA where they commit to compensate you for what that lost connectivity time is worth to you and your firm.

Rather than depend on an SLA, then, while it is good to insist on them, enterprises know that the real solution to high availability is having redundant, diverse links. Just as no serious IT shop would operate without redundant storage systems or redundant computing options for their most critical applications, SLAs are no substitute for redundant network connections. And if you are going to have redundant connections, why pay a 30x - 100x premium in terms of cost per Mbps per month just to have MPLS as one of those connections?

And if diverse broadband connections aren't available at some of your locations? As we covered last time, the answer there is to leverage always-available T1-based Internet connectivity for those locations. The savings for those locations compared to MPLS are relatively small, but then so typically are the differences in SLAs. The benefit of the ubiquity of the Internet is that you get to leverage broadband economics at 80% - 95% of your locations, while having a solution that covers 100% of your locations.

"But what about the reliability and performance predictability of broadband?" you continue to ask.  Again – that's a different subject altogether than MTBF, MTTR or anything guaranteed in an SLA! As indicated last time, we'll cover this very soon now in this series on knocking down MPLS myths. As you would expect, this critical issue is handled very well in the Next Generation Enterprise WAN (NEW) architecture. For now, know that MTBF, MTTR and SLAs are no longer a reason why enterprises need to use MPLS for reliable enterprise WANs.

A twenty-five year data networking veteran, Andy founded Talari Networks, a pioneer in WAN Virtualization technology, and served as its first CEO, and is now vice president of product management at Aryaka Networks. Andy is the author of an upcoming book on Next-generation Enterprise WANs.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2013 IDG Communications, Inc.