Skip Links

Why does packet loss destroy application performance over the WAN?

TCP's primary mechanism for signaling congestion causes application performance degradation in many ways, and was never optimized for high-bandwidth WANs or interactive applications over the WAN.

By Andy Gottlieb on Tue, 11/13/12 - 1:06pm.

Before continuing on to cover which of the various technologies – those that are part of the Next-generation Enterprise WAN (NEW) architecture as well as others – addresses packet loss and how, I think it will be worthwhile to go a bit deeper into why packet loss has such a huge impact on application performance over the WAN in the first place.

While this will not be a deeply technical explanation that would satisfy scientists, engineers and network developers, it will get into a bit more technical detail than the typical column in the series.

We spent two columns on the factors that most impact application performance over the WAN, noting that packet loss was one of the scourges of application performance over the WAN. In fact, packet loss has the greatest impact on the performance of most applications over the WAN, by design.

Why is packet loss such a killer? There are many reasons, most having to do with the nature of how TCP (Transmission Control Protocol) works, and especially how TCP does congestion control/congestion avoidance. The key issue revolves around dealing with contention for limited bandwidth.

TCP is designed to use all available bandwidth, and to use it "fairly" across flows on average. To do this, given that each end station and TCP flow doesn't know how much bandwidth is available – neither if the single flow was the only one using bandwidth end-to-end at the moment, nor in the more typical case when given multiple flows, the amount available changes moment to moment.  So the sender of the TCP data needs a way to know when "enough is enough." Packet loss is the basic signal of this.

While it may be counterintuitive, TCP and routers together are designed to cause loss, and react to it in specific ways to avoid overutilization of the network and the potential of congestion collapse. Because if data offered into the network wasn't reduced when the network was highly utilized and to the point of being overutilized, then useful work, a.k.a. "goodput," would cease. Think the traffic jams you see on highways, most frequently at on-ramps and off-ramps, but occasionally across the highway itself. The goals of TCP's design are to minimize the amount of time that the highway grinds to a halt (congestion avoidance), and to react appropriately to reduce traffic at those times that it does (congestion control).

TCP packets received by the receiving station are acknowledged back to the sending station. TCP is a window-based protocol, meaning that it can have a certain amount of traffic "in flight" between sending station and receiving station. It is designed to back off and substantially reduce the amount of bandwidth offered (by half) when packet loss is observed. Further, until the lost packet is received, and acknowledged by the receiver, only limited amounts of additional packets will be offered. Even for those applications that use multiple TCP flows, the similar principle applies that only so many new flows opened/packets sent until a lost packet is received at the other end and its receipt acknowledged.

Packet loss is detected in one of two ways. For a longer transfer where just a packet or two is lost, the sender notices and reacts to the loss when subsequent packets are acknowledged by the receiver, but not the missing one. Alternatively – and more typically for new or short TCP flows – packet loss is detected by the occurrence of a "timeout": the absence of receipt of an acknowledgement of the packet. The amount of time until a "timeout" is deemed to have occurred varies typically between a couple hundred milliseconds and three seconds.