Chapter 1: Introduction to Cisco Wide Area Application Services (WAAS)

Cisco Press

1 2 3 4 5 Page 3
Page 3 of 5

Packet loss is not generally a scenario that can be proactively reported to a transmitter; that is, a router that drops a particular packet cannot notify a transmitting node that a specific packet has been dropped due to a congested queue. Packet loss is generally handled reactively by a transmitting node based on the acknowledgments that are received from the recipient or the lack thereof. For instance, in the case of a connection-oriented transport protocol, if 5 KB of data is sent in five unique 1-KB sequences, an acknowledgment of only four of the five segments would cause the transmitter to retransmit the missing segment. This behavior varies among transport protocols and is also dependent upon the extensions to the transport protocol that are being used, but the general behavior remains consistent: an unacknowledged segment is likely a segment that was contained in a packet that was lost, not received correctly (due to signal degradation or errors), or oversubscription of the recipient buffer. Double and triple acknowledgments may also be used to indicate the window position of a segment that was not successfully received, to specify what the transmitter should resend.

In the case of TCP, the lack of an acknowledgment causes the transmitter not only to resend, but also to re-evaluate the rate at which it was sending data. A loss of a segment causes TCP to adjust its window capacity to a lower value to cover scenarios where too much data was being sent—either too much data for the network to deliver (due to oversubscription of the network) or too much data for the recipient to receive (due to congested receive buffers). The net effect is that, upon encountering packet loss and subsequently having to retransmit data, the transmitter will decrease the overall throughput of the connection to try and find a rate that will not oversubscribe the network or the recipient. This behavior is called congestion avoidance, as TCP adjusts its rate to match the available capacity in the network and the recipient.

The most common TCP implementation found today, TCP Reno, reduces the congestion window by 50 percent upon encountering packet loss. Although reducing the congestion window by 50 percent does not necessarily correlate to a 50 percent decrease in connection throughput, this reduction can certainly constrain a connection's ability to saturate the link. During the congestion avoidance phase with TCP Reno, each successful transmission (signaled by receipt of an acknowledgment) causes the congestion window to increase by one segment size. The purpose of the congestion window is to allow TCP to first react to packet loss, which ensures throughput is adjusted to available capacity, and secondly to continue to try and find additional available capacity as a result of continually increasing the congestion window for each successful transmission.

Figure 1-6 shows an example of how packet loss impacts the TCP congestion window, which impacts overall application throughput.

Figure 1-6

Impact of Packet Loss on Throughput

This "backoff" behavior not only helps TCP normalize around the available network capacity and available capacity in the recipient buffer, but also helps to ensure fairness among nodes that are competing for the available WAN bandwidth.

Introduction to Cisco WAAS

The previous sections examined the most common causes of application performance challenges found in WAN environments. Although the previous sections certainly did not cover every possible performance barrier, they summarized and briefly examined the largest of these problems. With this fundamental understanding of what contributes to application performance challenges, one might ask, "How are they solved?" Each application performance challenge has an appropriate solution, and these solutions must be implemented in a hierarchical manner with the appropriate solution in the appropriate point within the network, as shown in Table 1-1.

Table 1-1 Solutions to Application Performance Barriers Found in the WAN

Performance Barrier

Technology Solution

Application layer latency

Application layer optimization, including parallelization of serial tasks, prefetching, message prediction, local response handling, and object prepositioning

Application layer bandwidth consumption

Application layer object caching with local delivery at the edge of the network near the requesting user

Network bandwidth consumption and congestion

Compression, data suppression, QoS, application layer object caching

Packet loss sensitivity

Optimized transport protocol implementation with advanced congestion avoidance algorithms, TCP proxy architectures, rate-based transmission protocols, or forward error correction (FEC)

Network throughput

Optimized transport protocol implementation with advanced congestion avoidance algorithms, large transmit and receive buffers, window scaling

Prioritization and resource allocation

End-to-end QoS, including basic classification, deep packet inspection, prequeuing operations, hierarchical queuing and scheduling, post-queuing optimization

Cisco WAAS provides a solution to the performance barriers presented by the WAN by employing a series of application-agnostic optimizations, also known as WAN optimization, in conjunction with a series of application-specific optimizations, also known as application acceleration. WAN optimization refers to employing techniques at the network or transport protocol that apply across any application protocol using that network or transport protocol. Application acceleration refers to employing optimizations directly against an application or an application protocol that it uses. WAN optimization has broad applicability, whereas application acceleration has focused applicability.

Cisco WAAS is a solution that is transparent in three domains:

  • Client nodes: No changes are needed on a client node to benefit from the optimization provided by Cisco WAAS.

  • Servers: No changes are needed on a server node to benefit from Cisco WAAS.

  • Network: Cisco WAAS provides the strongest levels of interoperability with technologies deployed in the network, including QoS, NetFlow, IP service-level agreements (IP SLA), access control lists (ACL), firewall policies, and more. Transparency in the network is unique to Cisco WAAS.

This unique combination of three domains of transparency allows Cisco WAAS the least disruptive introduction into the enterprise IT infrastructure of any WAN optimization or application acceleration solution.

The following sections examine the WAN optimization and application acceleration components of Cisco WAAS in detail.

WAN Optimization

Cisco WAAS implements a number of WAN optimization capabilities to help overcome challenges encountered in the WAN. These optimizations include a foundational set of three key elements:

  • Data Redundancy Elimination (DRE): DRE is an advanced compression mechanism that uses disk and memory. DRE minimizes the amount of redundant data found on the WAN by utilizing a loosely synchronized compression history on Wide Area Application Engine (WAE) peers. When redundant data is identified, the WAE will send a signature referencing that data to the peer as opposed to sending the original data, thereby providing potentially very high levels of compression. Data that is non-redundant is added to the compression history on both peers and is sent across the WAN with newly generated signatures.

  • Persistent LZ Compression (PLZ): PLZ is a variant of the Lempel-Ziv (LZ) compression algorithm. The WAE uses a persistent session history to extend the compression capabilities of basic LZ, which helps minimize bandwidth consumption for data traversing the WAN. PLZ is helpful for data that is identified as nonredundant by DRE, and can also compress signatures that are sent by DRE on behalf of redundant chunks of data.

  • Transport Flow Optimization (TFO): TFO is a series of TCP optimizations that helps mitigate performance barriers associated with TCP. TFO includes large initial windows, selective acknowledgment and extensions, window scaling, and an advanced congestion avoidance algorithm that helps "fill the pipe" while preserving fairness among optimized and unoptimized connections.

Determining which optimization to apply is a function of the Application Traffic Policy (ATP), which can be managed discretely per WAAS device or within the Cisco WAAS Central Manager console, and is also dependent upon the optimization negotiation that occurs between WAAS devices during automatic discovery (discussed later in this chapter in "Other Features").

The data path for optimization within the Cisco WAAS device is the TCP proxy, which is used for each connection that is being optimized by Cisco WAAS. The TCP proxy allows Cisco WAAS to transparently insert itself as a TCP-compliant intermediary. In this way, Cisco WAAS devices can receive and temporarily buffer data sent from a host and locally acknowledge data segments when appropriate. By employing a TCP proxy, Cisco WAAS can also send larger blocks of data to the optimization software components, which permits higher levels of compression to be realized when compared to per-packet architectures in which the compression domain may be limited by the size of the packets being received.

Data in the TCP proxy is then passed through the associated optimization components based on the configured policy, and the optimized traffic is transmitted across the WAN using the optimized TCP implementation. By implementing a TCP proxy, Cisco WAAS can shield communicating nodes from unruly WAN conditions such as packet loss or congestion. Should the loss of a segment be encountered, Cisco WAAS devices can extract the segment from the TCP proxy retransmission queue and retransmit the optimized segment, thereby removing the need for the original transmitting node to retransmit the data that was lost in transit. Transmitting nodes enjoy the benefits of having LAN-like TCP performance, exhibiting the characteristics of minimal packet loss and rapid acknowledgment. By using a TCP proxy, Cisco WAAS allows data to be drained from the transmitting nodes more quickly and nearly eliminates the propagation of performance-limiting challenges encountered in the WAN.

Figure 1-7 shows the Cisco WAAS TCP proxy architecture and how it provides a buffer that prevents WAN performance from impacting transmitting nodes.

Figure 1-7

Cisco WAAS TCP Proxy Architecture

The following sections examine each of these optimizations in more detail.

Data Redundancy Elimination

DRE is an advanced, lossless compression algorithm that leverages both memory (high throughput and high I/O rates) and disk (persistent and large compression history). DRE examines data in-flight for redundant patterns (patterns that have been previously identified). As redundant patterns are identified, they are replaced with a signature that references the redundant pattern within the peer WAAS device compression history. As these signatures are only 5 or 6 bytes in size (depending on the breakpoints identified within the data), and the redundant pattern identified could potentially be tens or hundreds of kilobytes, DRE can provide significant levels of compression for flows containing data that has been previously identified, which helps minimize bandwidth consumption on the WAN.

DRE is bidirectional, meaning patterns identified during one direction of traffic flow can be leveraged for traffic flowing in the opposite direction. DRE is also application agnostic in that patterns identified within a flow for one application can be leveraged to optimize flows for a different application. An example of the bidirectional and application-agnostic characteristics of DRE is as follows. Assume two users are located in the same remote office, which is connected to the corporate campus by way of a T1 WAN. Both the remote office and the corporate campus have Cisco WAAS devices installed. Should the first user download an e-mail containing an attachment, the compression history on each of the WAAS devices in the connection path would be updated with the relevant data patterns contained within the flow. Should the second user have a copy of that file, or a file containing similarities, and upload that file by way of another application such as FTP, the compression history that was previously built from the e-mail transfer could be leveraged to provide tremendous levels of compression for the FTP upload.

Hierarchical Chunking and Pattern Matching

As data from a connection configured for DRE optimization enters the TCP proxy, it is buffered for a short period of time. After data builds up in the buffer, the large block of buffered data is passed to DRE to enter a process known as encoding. Encoding is the process of taking transmitted data in from a transmitting node, eliminating redundancy, updating the compression library with any new data, and transmitting compressed messages.

DRE encoding calculates a message validity signature over the original block of data. This message is used by the decoding process on the peer WAE to ensure correctness when rebuilding the message based on the signatures contained in the encoded message. A sliding window is used over the block of data to be compressed, which employs a CPU-efficient calculation to identify breakpoints within the data based on the actual data being transferred, which is also known as content-based chunking. Content-based chunking relies on the actual data itself to identify breakpoints within the data and, as such, is less sensitive to slight changes (additions, removals, changes) upon subsequent transfers of the same or similar data. With content-based chunking, if a small amount of data is inserted into a chunk during the next transmission, the chunk boundaries shift with the insertion of data, allowing DRE better isolation of new data, which helps retain high levels of compression as the other chunks remain valid.

Chunks are identified at multiple layers, and aggregate chunks referencing smaller, lower-layer chunks can be identified. Due to this multi-layer approach to chunking, DRE is hierarchical in that one chunk may reference a number of smaller, lower-layer chunks. If higher-layer chunks are identified as redundant, a single signature can be used to reference a larger number of lower-layer chunks in aggregate form. In essence, DRE aggregation provides a multiresolution view of the same data using chunks of different sizes and levels.

1 2 3 4 5 Page 3
Page 3 of 5