Why does packet loss destroy application performance over the WAN?

TCP's primary mechanism for signaling congestion causes application performance degradation in many ways, and was never optimized for high-bandwidth WANs or interactive applications over the WAN.

Before continuing on to cover which of the various technologies – those that are part of the Next-generation Enterprise WAN (NEW) architecture as well as others – addresses packet loss and how, I think it will be worthwhile to go a bit deeper into why packet loss has such a huge impact on application performance over the WAN in the first place.

While this will not be a deeply technical explanation that would satisfy scientists, engineers and network developers, it will get into a bit more technical detail than the typical column in the series.

We spent two columns on the factors that most impact application performance over the WAN, noting that packet loss was one of the scourges of application performance over the WAN. In fact, packet loss has the greatest impact on the performance of most applications over the WAN, by design.

Why is packet loss such a killer? There are many reasons, most having to do with the nature of how TCP (Transmission Control Protocol) works, and especially how TCP does congestion control/congestion avoidance. The key issue revolves around dealing with contention for limited bandwidth.

TCP is designed to use all available bandwidth, and to use it "fairly" across flows on average. To do this, given that each end station and TCP flow doesn't know how much bandwidth is available – neither if the single flow was the only one using bandwidth end-to-end at the moment, nor in the more typical case when given multiple flows, the amount available changes moment to moment.  So the sender of the TCP data needs a way to know when "enough is enough." Packet loss is the basic signal of this.

While it may be counterintuitive, TCP and routers together are designed to cause loss, and react to it in specific ways to avoid overutilization of the network and the potential of congestion collapse. Because if data offered into the network wasn't reduced when the network was highly utilized and to the point of being overutilized, then useful work, a.k.a. "goodput," would cease. Think the traffic jams you see on highways, most frequently at on-ramps and off-ramps, but occasionally across the highway itself. The goals of TCP's design are to minimize the amount of time that the highway grinds to a halt (congestion avoidance), and to react appropriately to reduce traffic at those times that it does (congestion control).

TCP packets received by the receiving station are acknowledged back to the sending station. TCP is a window-based protocol, meaning that it can have a certain amount of traffic "in flight" between sending station and receiving station. It is designed to back off and substantially reduce the amount of bandwidth offered (by half) when packet loss is observed. Further, until the lost packet is received, and acknowledged by the receiver, only limited amounts of additional packets will be offered. Even for those applications that use multiple TCP flows, the similar principle applies that only so many new flows opened/packets sent until a lost packet is received at the other end and its receipt acknowledged.

Packet loss is detected in one of two ways. For a longer transfer where just a packet or two is lost, the sender notices and reacts to the loss when subsequent packets are acknowledged by the receiver, but not the missing one. Alternatively – and more typically for new or short TCP flows – packet loss is detected by the occurrence of a "timeout": the absence of receipt of an acknowledgement of the packet. The amount of time until a "timeout" is deemed to have occurred varies typically between a couple hundred milliseconds and three seconds.

TCP is an elegant protocol designed over 40 years ago when CPU and memory for keeping state was extremely expensive, and where the design goal for the router middleboxes was for them to be stateless. This worked – and continues to work – fantastically well on high-bandwidth, low-latency LANs and on low-bandwidth, high-latency WANs. But TCP wasn't designed to work optimally in the medium-to-high bandwidth, high-latency environment that characterizes most WAN use today. TCP also wasn't designed optimally for running interactive applications (web browsing, remote desktop) across very long-distance WANs.

TCP particularly was designed so that each end station could make its decisions completely independently of every end station. This conservative approach contributes to network stability and minimization of congestion.

Because the amount of data offered into the network is reduced by half – and only increased slowly thereafter as packets received successfully are acknowledged – when a single packet loss is detected by the sending station, WAN packet loss can have a huge impact on large transfer performance.

For short flows, where one of the first few packets is lost, a single lost packet can result in significant application delay because of the need to wait for the timeout to occur.

Now, if the network is so congested that many packets are being lost, this is probably the right behavior to ensure that network conditions don't get worse still. But it frequently is too harsh a penalty for the "unlucky" WAN flow subject to that single loss.

The conservative approach taken by TCP is still pretty much the best way to address network bandwidth contention on a LAN. All these years later, however, intelligent WAN devices and technology in the middle, which have many orders of magnitude more CPU processing power and memory available than they did when TCP was first introduced, and much more CPU and memory capability relative to WAN bandwidth than even 10 or 15 years ago, can make better decisions to deliver more network stability, more efficiency, better network utilization and better application performance.

Next time we'll start to look at the ways that various WAN technologies and techniques address the impact of packet loss on application performance.

A twenty-five year data networking veteran, Andy founded Talari Networks, a pioneer in WAN Virtualization technology, and served as its first CEO, and is now leading product management at Aryaka Networks. Andy is the author of an upcoming book on Next-generation Enterprise WANs.

Copyright © 2012 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022