Delivering global SD-WAN is very different from delivering local networks. Local networks offer complete control to the end-to-end design, enabling low-latency and predictable connections. There might still be blackouts and brownouts but you’re in control and can troubleshoot accordingly with appropriate visibility.
With global SD-WANs, though, managing the middle-mile/backbone performance and managing the last-mile are, well shall we say, more challenging. Most SD-WAN vendors don’t have control over these two segments, which affects application performance and service agility.
In particular, an issue that SD-WAN appliance vendors often overlook is the management of the last-mile. With multiprotocol label switching (MPLS), the provider assumes the responsibility, but this is no longer the case with SD-WAN. Getting the last-mile right is challenging for many global SD-WANs.
It’s your responsibility to monitor all of the last-mile links around the globe, identify problems, and then engage with the local Internet service providers (ISPs) for that region. New solutions are emerging to address this problem.
Managing the middle-mile
As you know, Internet connections consist of three components: the last-mile from the customer’s premises to the ISPs premises (the first mile), the middle-mile (known as the Internet core), and the last-mile from the destination ISPs premises to the customer’s premises.
The middle-mile consists of autonomous and interconnected networks, all of which have different business objectives. The traffic flow is coupled with financial exchange and ISPs route packets based on economics. Unfortunately, other metrics that may better suit the performance of the application are not considered.
This results in an unreliable end-to-end connectivity. On a particular day, your Internet traffic might take the fewest possible hops, but the next day could bounce around the world. And there is not much you can do about that. A more viable solution is to privatize the middle-mile. This offers strict latency, loss and jitter requirement metrics.
Global independent backbones help address these issues. Global cloud-based providers, such as Aryaka and Cato Networks provide a global backbone. Also, Mode offers global backbones that work with any third-party SD-WAN device.
Enterprises access these backbones, in many cases, it could be less than 20ms or so from the customer premise. Now you can enjoy the performance benefits of a predictable global backbone.
Managing the last-mile
In a previous consultancy role, I had to roll out a global network within a short period of time. There were link issues and I had to raise an emergency ticket with an ISP that was based in Hong Kong.
The chief executive officer (CEO) had tight deadlines and wanted the ticket to be closed the same night, not caring about the low-level details. I was based in London and Hong Kong is 8 hours ahead. Let’s say it was a long night.
Regardless of my frustrations, at least I knew they were responsible for keeping the last-mile connected. I could assist to some extent, but essentially the troubleshooting was out of my hands.
More sites – more ISPs
Every site requires an ISP and managing the last-mile creates headaches, even for the most patient. Each ISP requires a new relationship, language, culture, and process.
Depending on the size and location of the ISP I could be assigned a single person to guide me through the process and act as the contact point for future problems. An efficient network engineer needs to build out of process shortcuts for each ISP. What happens when that person leaves or the processes change?
It creates more work. All this work is compounded by the number of ISPs. Not to mention the out of process tricks you have created.
Problems I see with the last-mile
I have not yet seen a solution to solve this problem. The carriers and providers who would monitor and manage your last-mile are often limited by the capabilities of the edge device.
Most of the time, this is a standard Layer 3 appliance using simple measurements such as internet control message protocol (ICMP) request/response. ICMP request/response is pretty low down in the stack. It misses out on a bunch of performance-related information that would be useful to ensure the applications work at peak level.
When something starts to go wrong, they have little understanding of the link characteristics such as a link slowdown due to interface congestion. Many last-mile management providers focus on detecting line failures only, which can be inconsistent.
Carriers and traditional last-mile management providers like Experio, manage the line from the customer's router to the ISP. They do not detect problems and lack visibility from the ISPs upstream connectivity.
Last-mile management options
There are a couple of ways to solve this.
Aryaka was probably the first of the independent, global backbones services to introduce last mile management. Their service monitors all traffic traveling across the Aryaka tunnel from the customer location to the Aryaka backbone.
Internet traffic is split out locally. What that means, though, is that their only managing site-to-site traffic. Should the ISP suffer an outage or slow down due to a routing or QoS issue that only impacts the Internet, Aryaka won’t detect it. In an era, where cloud applications are critical to companies that strikes me as a big problem, to say the least.
Cato has announced a new service recently to overcome this gap. The Cato Intelligent Last-Mile Management (ILMM) service will address the above-mentioned problems. Cato manages the complete last-mile from the customer premises to the Cato’s PoP. Both types of traffic are tunneled back to Cato’s PoP and split out from there.
This is in comparison to other SD-WAN vendors that split at a site level. Hence, if you degrade the Internet but not the WAN traffic, it won’t be detected.
The ability to simplify the last-mile management is an enormous step forward in global SD-WAN deployments. With every network design, I try to simplify as much as possible while still keeping adequate control and visibility.