WAN acceleration offers huge payoff

Riverbed edges Cisco, Silver Peak and Blue Coat in test of devices that speed traffic, cut bandwidth and save money

After seven months of pounding on application-acceleration gear from Blue Coat, Cisco, Riverbed and Silver Peak, we can actually attest that these devices really work. With an average acceleration gain of 5 to 10 times over normal traffic flows for all devices across all tests, these products could be deployed to yield a significant drop in monthly WAN consumption.

Why is Windows so bad?

The problem statement for application acceleration is simple: Windows performance in the WAN is lousy. To begin with, Windows' two workhorse protocols -- TCP and NetBIOS -- were never intended for use in low-bandwidth or high-delay networks. Windows XP Service Pack 2 compounds these problems with some spectacularly suboptimal configuration defaults. (Windows Vista is better, but it isn't widely implemented yet.)

By default, XP's TCP stack advertises a receive window -- the maximum amount of data allowed in flight without acknowledgment -- of 64KB. That's fine as far as it goes, but XP isn't very responsive about resizing that window in response to loss or delay. A large, static receive window contributes to retransmissions, possible packet loss and poor response time.

To make matters worse, XP doesn't use a common TCP option called window scaling that can expand a 64KB receive window by a factor of four or more. Even when network conditions let XP go much faster, it won't. (There is a registry hack to enable window-scaling, but even then, it isn't used by the Windows file-handling protocol.)

WAN performance is always limited by the so-called bandwidth-delay product, but the constraints with Windows clients are especially severe. For example, if a link between Boston and Los Angeles has a 100-msec round-trip delay and the Windows TCP receive window is 64KB, the highest transmission rate possible is only around 5.6Mbit/s, regardless of link speed. Ordering up a T-3 or OC-3 won't help, at least not for any given Windows TCP connection; 5.6Mbit/s is as good as it gets.

WAN-acceleration devices compensate for these shortcomings with a variety of tricks, including block caching, compression, connection multiplexing and application-layer optimization. While not all devices implement every method, all sharply reduce response time and bandwidth for Windows applications across the WAN.

Faster file service

As part of our research for this test, we asked vendors and several corporate IT shops to name their top five candidates for application acceleration, and every respondent named Common Internet File System (CIFS) as its top pick. This is understandable, given that Microsoft's notoriously chatty file-handling protocol originally was intended for LAN-only operations. Given its popularity and performance issues, we made CIFS the highlight of our performance testing.

We tested application acceleration the way enterprises use it -- with multiple WAN links and round-trip times. Our test bed modeled a hub-and-spoke WAN linking with a headquarters office plus four remote sites, two apiece on T-1 and T-3 links. The remote sites represented every permutation of high and low bandwidth and delay.

At each of the remote sites, we configured XP clients to upload and download directories containing Word documents from a Windows Server 2003 machine at headquarters.

To measure the effects of block and/or file caching, we ran the CIFS tests three times. First was a "cold run" with all caches empty. Second was a "warm run" that repeated the same transfer as the cold run, this time with the files already in cache. Finally, we changed the contents of 10% of the files; this "10% run" forced devices to serve some but not all content from the origin server.

The two most important application-acceleration metrics are bandwidth reduction and response-time improvement. While we measured both in this test, our results show there's not necessarily a strong correlation between the two. A device with a powerful compression engine might do well at reducing bandwidth consumption, but the time spent putting the squeeze on data might increase response time or, at best, yield only modest improvements. Conversely, some devices might willingly trade off a bit more bandwidth consumption if the net result is faster overall data delivery.

Looking first at bandwidth-reduction results, all products substantially lightened the WAN load, but big differences exist across devices depending on cache contents. See graphic :

 CIFS WAN bandwidth

For example, in the cold run (caches empty), Cisco's Wide Area Engine (WAE) appliances were by far the most effective at compression, using nearly 28 times less bandwidth than was used in our baseline, no-device test. In contrast, the bandwidth savings for other devices seeing data for the first time was usually less than a two-times reduction in bandwidth, according to measurements taken by a ClearSight Networks Network Analyzer.

Note that we're presenting all results in terms of relative improvement rather than absolute numbers. For example, in the CIFS cold run, Cisco's devices consumed 130MB of WAN bandwidth, compared with 3.6GB with no acceleration device inline, which translates into using 27.82 times less bandwidth. (The absolute numbers from all tests are available online (Excel download).

Given that enterprise data patterns are repetitive and subject to change, bandwidth reduction in the warm and 10% test cases can be more meaningful -- and this is where these devices really shine.

Riverbed's Steelhead appliances topped these tests, reducing bandwidth by a factor of 84 in the warm run and a factor of 32 in the 10% run. While the other devices reduced bandwidth by a lesser degree, the improvements were still dramatic. Any device that reduces bandwidth use by 20 or 30 times must be considered a boon to IT budgets.

We also used the ClearSight analyzer to measure LAN bandwidth consumption (see graphic "CIFS LAN bandwidth reduction" and other online-only performance results). LAN differences among products were not as dramatic as WAN differences. The Blue Coat and Cisco devices reduced LAN bandwidth consumption by factors of 1.5 to 2 in our warm run and 10% run, because these vendors' headquarters devices served objects out of cache instead of from the origin servers. In contrast, the Riverbed and Silver Peak devices increased LAN use by 2% to 10%, probably because of appliance-control traffic. Changes in bandwidth use don't always correlate with changes in response time, however.

Measuring CIFS response time

We used a common enterprise task to gauge CIFS response time, measuring how long it took for a client to upload or download a set of Word files to or from a server. We measured transfer times at each of our four remote sites -- each representing a different permutation of high and low bandwidth and delay. We're presenting the results for each site because users' requirements differ depending on where they work. As our results suggest, some appliances do a better job at accelerating CIFS in low-bandwidth settings; others are better for high-delay settings.

Arguably, the most important results for enterprises are from the 10% runs, where we offered 10% new content and 90% existing content to each set of appliances. This represents an enterprise where many users might see the same documents repeatedly but where there also would be some new documents added to the mix.

In the download tests, low-bandwidth sites tended to see the biggest improvements in response time, regardless of the amount of delay present. See graphic :

  Downloading CIFs

Riverbed's Steelhead appliances sped up file transfers 45 times to a low-bandwidth, low-delay site and 34 times to a low-bandwidth, high-delay site. The Steelhead appliances were also tops for the high-bandwidth sites, but to a lesser degree, with speed increases of four to seven times.

The Silver Peak NX appliances were next most efficient overall, with speedups of three to 16 times (again, with the most improvement shown for low-bandwidth sites), followed by the Cisco and Blue Coat appliances.

File uploads generally don't benefit from application acceleration as much as downloads do. When handling client downloads, acceleration devices either serve content from a client-side cache, pipeline data using read-ahead operations or employ some combination of the two approaches. That's not possible with write operations, because an acceleration device can't predict in advance what data the client will send.

Even so, big improvements in upload performance are still possible. See graphic :

  Uploading CIFs

Riverbed's Steelhead appliance again led the pack, with speedups of three to 34 times compared with no acceleration. Accelerations from the Silver Peak, Cisco and Blue Coat devices were less dramatic but still significant, moving traffic 1.3 to 16 times faster than our baseline test. Most devices sped up data the most from low-bandwidth sites. Blue Coat's SG was an exception; it delivered the greatest upload benefit to the high-bandwidth, high-delay site.

Note that response-time improvements do not track linearly with bandwidth-reduction results. For example, Cisco's devices were more efficient, relative to their competitors, at reducing WAN bandwidth consumption than at speeding CIFS transfer times.

In reviewing the CIFS results, Riverbed commented that it achieved even greater improvement over no-acceleration baselines by using many small files. Our tests used a mix of random file sizes of 25KB to 1MB. Both approaches have their merits: Riverbed's short-file methodology is more stressful on devices' CIFS processing engines (stress is a good thing in device benchmarking), while a mix of larger files may offer a more meaningful prediction of device performance in production settings.

Mail call

After CIFS, the next most popular candidate for acceleration is Messaging API (MAPI) traffic. MAPI is the e-mail protocol used by the Microsoft Exchange server and Outlook clients. All devices tested can speed up MAPI traffic, but in our tests the improvements were far less significant than in the CIFS tests.

In our MAPI tests, all clients sent messages -- some with Word attachments, some without -- to all other clients through an Exchange 2003 server. As with the CIFS tests, the number of messages was proportional to each site's link speed -- fewer messages for clients at T-1 sites, more for those at T-3 sites.

There was significantly less differentiation among products when accelerating MAPI traffic, compared to CIFS traffic. See graphic :

  MAPI acceleration

All products sped mail delivery, but only by factors of 1.24 to 2.39 compared with a no-device baseline. Averaging results across all sites, the Blue Coat devices provided the biggest boost for mail traffic, but by a relatively small margin over the Riverbed, Silver Peak and Cisco devices.

Doubling e-mail performance is nothing to sneeze at, but we also wanted to understand why MAPI performance didn't match CIFS performance. A few minutes with the ClearSight analyzer gave us the answer: The Outlook 2007 clients we used in this test encrypt e-mail traffic by default.

To the acceleration appliances, most of the MAPI data structures weren't visible to be optimized. Some acceleration was still possible, through TCP optimizations or because some MAPI traffic was visible. After reviewing the results, Riverbed said it encourages Outlook 2007 users to disable encryption for highest performance. That said, network managers using the new version of Outlook should consider whether the security/performance tradeoff is worthwhile.

A faster Web

We measured acceleration of HTTP traffic in two tests, one with 248 and and one with 2,480 concurrent users. The results were a bit surprising: While the products delivered Web traffic as much as seven times faster than a baseline test without acceleration, performance didn't necessarily improve as we added more users.

To avoid overloading the sites on slower links, we put proportionately fewer users at the T-1 sites than at the T-3 sites. For example, our 2,480-user test involved 1,200 clients at each of two sites on a T-3, and 40 clients at each of two sites on a T-1. We used Spirent Communications' Avalanche/Reflector tool to emulate Web clients and servers. Because previous studies of Web objects place the average size at 8K to 13KB, we configured the clients to request an 11KB object from the servers.

As in the CIFS and MAPI tests, the Riverbed Steelhead appliances delivered Web traffic the fastest ( see graphic  Web acceleration). In all three ways we measured -- transactions per second, traffic rates and response time -- the Steelhead appliances delivered Web traffic seven times faster than tests with no device inline. We observed the same seven-times improvement with 248 and 2,480 users; because LAN and WAN bandwidth use was almost identical in each test, it's likely that WAN bandwidth was the bottleneck.

Blue Coat's SG appliances were second fastest, but that result must be stated with a caveat: The Blue Coat boxes worked better with fewer Web users, not more. Compared with no acceleration, the Blue Coat appliances boosted Web performance by around seven times for 248 users, but by around six times for 2,480 users (and that's just for transactions per second and data rate; the response time improved by only a factor of three).

1 2 Page
Must read: 11 hidden tips and tweaks for Windows 10
View Comments
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies