Last time, we began our discussion of what can be done to address the impact of packet loss on application performance over the WAN. We listed six different possibilities, and went through how one of them can significantly improve application performance in the face of packet loss. Today we'll cover two more techniques: mitigating and hiding the effect of loss from the end station, and reacting differently to observed packet loss.
We saw that drastically reducing the number of packets that traverse the WAN using one or more of replicated file service, local web (HTTP) object caching and WAN Optimization's data deduplication, and CIFS-specific application proxy technologies, will greatly improve application performance in the face of WAN packet loss. But this is by no means the only method, and in fact those methods are either very application-specific or work only when the data in question has already traversed the WAN once already. The techniques we'll cover today and in our next column will work for all TCP (Transmission Control Protocol) applications, and some will work for real-time applications as well.
The first technique to mitigate the effects of packet loss is to use Forward Error Correction (FEC). FEC uses additional overhead along with the packet stream in order to correct errors in a data stream without requiring retransmissions. Silver Peak is a WAN Optimization vendor that promotes their use of FEC.
FEC works well when there is consistent, uniformly distributed low-to-moderate packet loss. This can happen if there are bit error rates on a faulty last mile DSL line, for example, although such faults are far less frequent than they used to be. But packet loss in the WAN is almost always caused by congestion-based dropping of packets by routers (or some other forwarding device) along the path between locations. And in fact, congestion-based packet loss is decidedly not typically uniformly distributed; rather it is bursty. And in particular, it's unpredictable as to the duration of the loss. Many loss durations are very brief, and a few are very long – and there is no way to tell in advance what the duration of the congestion event will be. No reasonable FEC overhead will successfully reconstruct the data when two consecutive large packets are lost, for example. Because of this, FEC – even "adaptive" FEC, which attempts to use more forward redundancy when loss rates seem higher – is almost always ineffective in practice. It uses additional bandwidth for the error correction, and yet will almost never be able to handle the runs of high packet loss that have the greatest impact on application performance.
Another technique to mitigate the effect of packet loss used by most WAN Optimization solutions is to do TCP termination at each WAN Optimization appliance, and combine this with a different technique than standard TCP for communicating between the two appliances. In this way, the packet loss is hidden from the end stations, so they don't cut back their offered TCP window size. (The WAN Optimization appliances will buffer traffic as needed.) While the TCP termination is primarily done in order to most effectively use techniques like compression and data deduplication, under certain circumstances it can also improve application performance in the face of packet loss.
Most commonly, each WAN Optimization device will run either a proprietary version of "high-speed TCP" or perhaps an RFC 3649-compliant implementation. When attempting to fill a high-bandwidth WAN connection with fairly large latency, even an occasional single packet loss under ordinary TCP can drastically reduce the amount of bandwidth utilized, because TCP is designed to cut back the window size by half in the face of a single lost packet, and grow the window size relatively slowly as acknowledgements are received. High Speed TCP implementations fix this problem and work well under low loss (i.e. packet loss rates much less than 1% over any useful timeframe).
Proprietary high-speed TCP implementations between WAN Optimization devices can also improve application performance in the face of moderate WAN packet loss (i.e. in the range of 0.5% - 4%) under certain specific circumstances. If using a dedicated point-to-point connection, say, or a private MPLS connection where you have paid for the bandwidth and you can be reasonably sure that there isn't another private location also trying to send meaningful amounts of data into the receiving side's WAN link, then using a proprietary TCP (or even non-TCP) communication method between the WAN Optimization appliances can safely result in better application performance and better utilization of expensive WAN links.
If, however, a shared WAN is being used – i.e. the public Internet, or even an MPLS connection where a location is receiving data streams from two different data centers simultaneously – then sending packets more aggressively than TCP's congestion control algorithm specifies can actually make the performance problem worse. And doing so on the public Internet means you are violating one of the most "sacred" network-level Netiquette rule that there is: avoiding congestion collapse by everyone "playing fair" and adhering to the spirit of the TCP congestion control rules is critical to the continuing decent performance of the Internet. Consequently, the documentation for these high-speed TCP implementations usually advises not using them over the public Internet. (How many recommend caution when using multi-connection MPLS networks which are not simple hub-and-spokes to a single data center designs, as they should, I cannot say.)
A third technique to mitigate the effect of WAN packet loss and hide it from the end stations is used by some WAN Virtualization implementations. It is similar to TCP termination (whether or not TCP termination is actually done) in buffering TCP packets at the sending appliance, and retransmitting them in the face of packet loss, again shielding the loss from the end stations. By using multiple network connections, retransmitting quickly – often on a different network path – when a packet loss is detected, and moving away altogether from a network path where high packet loss is detected, WAN Virtualization can deliver excellent application performance, even using public Internet links, in the face of meaningful packet loss. And it does this without risking increasing congestion on the Internet, since it specifically uses a network path less when that path is seriously congested. For real-time traffic, or even for lower-bandwidth interactive TCP flows like those for VDI, some WAN Virtualization implementations will replicate the flows across different network paths, suppressing duplicates at the receiving side, and thereby delivering lossless connectivity between the communicating hosts even when high packet loss is exhibited on one of the network paths.
Next time we'll continue our look into other ways that various WAN technologies and techniques – those that are part of the Next-generation Enterprise WAN (NEW) architecture as well as others – address the performance problems caused by WAN packet loss.
A twenty-five year data networking veteran, Andy founded Talari Networks, a pioneer in WAN Virtualization technology, and served as its first CEO, and is now leading product management at Aryaka Networks. Andy is the author of an upcoming book on Next-generation Enterprise WANs.