Speed traps

Scalability tests push Web front-end box limits.

The Clear Choice Test of scalability aims to find the proper size one of a Web front-end product for a network's needs.

While our services tests assessed how well Web front-end devices handled application traffic, our scalability tests can help properly size one of these products for a particular network's needs.

The scalability tests demonstrated the limits of system performance in terms of maximum concurrent-connection capacity, TCP multiplexing ratios and maximum forwarding rates. In all these areas, the test results show big differences among devices.

In the maximum concurrent-connections test, our goal was to determine how many client connections a device could handle. There were major differences among vendors in this test.

To determine connection count, we configured Spirent's Avalanche to emulate as many as 4 million clients running Internet Explorer. Each client opened a TCP connection and requested a 1KB Web object from the Web front-end device's virtual one or more IP addresses (just as a single IP address for, say, www.amazon.com hides dozens or hundreds of servers, all these devices used one virtual IP address as a proxy for the back-end servers on the test bed).

Maximum concurrent TCP connections
Citrix’s NetScaler Application Accelerator set up 4 million concurrent TCP connections, the most of any product tested and the limit of our test bed. F5’s BIG-IP and Foundry’s ServerIron were next, but Foundry says its result is partly a function of the high rate we used in testing, and that its device sets up more than 7 million connections at a lower rate.
VendorConcurrent TCP Connections
No device (baseline)4,000,000
Array700,000
Citrix4,000,000
Crescendo900,000
F53,499,751
Foundry*2,699,844

Juniper,

1 system
500,000

Juniper,

4 systems
1,999,969
*Reached rate limit beforeconnection limit

After receiving the object, clients sat idle for 60 seconds before requesting another object over the same connection. This long "think time," how long a client waits before requesting the next page, allowed us to build up connection count. For all vendors, we kept adding new connections until the device failed to complete some transactions, or until we reached 4 million connections, the limit of our test bed. Even though our goal was a Layer-4 measurement - the number of established TCP connections - we used Layer-7 switching in this and all other tests.

Citrix's NetScaler Application Delivery System set up 4 million concurrent TCP connections, the limit of our test bed (see graphic at right). F5's BIG-IP was next, setting up about 3.5 million connections, followed by Foundry's ServerIron 450 with about 2.7 million connections.

Juniper's DX 3600 topped out at 2 million connections with four systems working together and 500,000 concurrent connections on a single box. The vendor says its appliance has a hard-coded limit of 500,000 connections per system, something reflected in our test results.

The Crescendo and Array systems each sustained fewer than 1 million connections. Crescendo says its device has a hard-coded limit of 1 million connections, a few of which are reserved for internal use.

Foundry objected to our test methodology, noting that we were stressing a limit of the ServerIron's connection-establishment rate rather than its concurrent-connection capacity. Foundry says the ServerIron can set up more than 7 million concurrent connections when it handles connection-establishment requests at a lower rate, or with a longer think time between client requests.

We agree that rate and capacity are different variables, and ideally should be measured one at a time. Unfortunately, time constraints prevented us from rerunning these tests on all devices at a lower rate. The numbers we found are still valid, in that we tested all devices the same way, but we take Foundry's point that its number (and perhaps that of other devices) reflects connection establishment rate, not just capacity.

TCP multiplexing

Although every device in this test supports TCP multiplexing, the 300-fold differences in the devices' results clearly show that not all TCP multiplexing engines are built the same.

TCP multiplexing can offer a powerful performance boost by offloading computationally intense TCP processing from servers. It's such an important feature that we used it as one of two criteria, along with Layer-7 switching, that all devices had to support.

To measure TCP multiplexing, we configured the Avalanche test tool to set up and maintain 100,000 TCP connections from emulated clients. As in the tests of response time with and without HTTP compression, clients requested home pages from Amazon.com, the BBC, UCLA, the White House and Yahoo. We compared the number of client-side connections (always 100,000) with the number of server-side connections over 60 seconds to determine the TCP multiplexing ratio.

The ratio of client-to-server connection counts may be dependent on user think times. For some devices, think time has a huge impact on multiplexing ratio; for others, it barely matters.

We began with a 3-second think time. That may not sound like much of a delay, but it does reflect the fact that visitors to e-commerce sites tend to move through pages very quickly. One study of QoS for e-commerce sites conducted at the University of Wisconsin assumed an average think time of 2.5 seconds per page, just below the 3-second figure we used. (You can perform your own benchmark by observing how quickly you move through pages next time you shop online.)

We repeated the test with a 60-second think time, a more appropriate interval for Web pages containing lots of text. (If you're reading this online, consider how long it's been since you loaded this page.)

Most devices showed improved TCP multiplexing ratios with longer think times. The standout was Citrix's NetScaler Application Delivery System, which mapped 346 client connections onto each server connection when we used 60-second think times. That's a huge reduction in the workload for servers behind this box.

It was a very different story when we used a 3-second think time with the NetScaler Application Delivery System. The device set up just three client connections per server connection, a hundredfold reduction in TCP multiplexing efficiency compared with the 60-second case. Citrix says it normally delivers ratios much higher than 3-to-1 and cites as the culprits the lack of caching and the high transaction rates in our tests.

Crescendo's CN-5080E offloaded TCP connections by nearly 100-to-1 with a 60-second think time in place. The vendor was disappointed to see "only" a 60-to-1 offload with a 3-second think time. We're also unable to explain that result; in earlier tests with a different software release, the Crescendo box repeatedly set up 1,024 server-side connections regardless of think time (even with zero think time).

Foundry's ServerIron also delivered some mysterious multiplexing results. It had regularly set up TCP multiplexing ratios of 50-to-1 in earlier tests, regardless of think time. But after an upgrade of the ServerIron code, the ratio fell to slightly more than 2-to-1. Foundry was unable to explain the difference and was also unable to reproduce this outcome in its own tests.

Other vendors showed relatively little benefit from TCP connection count. For example, Juniper's single-box results showed less than a 2-to-1 multiplexing ratio, regardless of think time. With four Juniper devices, the multiplexing ratio improved to 5-to-1 with a 60-second think time.

To understand why we saw such big differences in multiplexing ratios, it's helpful to see the rate at which each box handled transactions. Note that emulated clients attempt to request objects at the same rate with all devices; thus, the differences in transaction rates are entirely a function of how fast each device responds. With the notable exception of Juniper's DX 3600, devices that processed transactions at a higher rate achieved higher TCP multiplexing ratios.

One caveat before leaving the topic of TCP multiplexing: Just because a device offloads TCP connections from servers in a 100-to-1 ratio, it doesn't mean 100 servers can be replaced with just one. Many other factors come into play, including server CPU and memory use, network utilization and application behavior. Even taking these factors into account, TCP multiplexing still can bring big benefits to beleaguered server farms.

Not so goodput

Some of the devices tested have 16 Gigabit Ethernet interfaces or more, begging the question as to how quickly they forward data.

To find out, we measured each device's "goodput" - defined in RFC 2647 as the amount of data received minus any data lost or retransmitted. In the context of this test, goodput is a Layer-7 measurement, reflecting how quickly a device transmits requested HTTP objects back to the client.

We measured goodput by setting up 100 emulated clients, each requesting 1MB objects from servers behind each device under test. As in all other tests, we used different patterns in the URL for each object, forcing devices to make Layer-7 switching decisions.

We began with a baseline measurement with no device present to demonstrate the channel capacity of our test bed. The goodput in this back-to-back test was about 3.8Gbps, close to the theoretical maximum rate when factoring for Ethernet, IP, TCP and HTTP overhead.

HTTP Goodput
“Goodput” describes application-layer perform-ance by measuring forwarding rate, minus any data lost or retransmitted. Crescendo posted the highest goodput of any device tested, but results for the Array and Citrix systems also are noteworthy, in that both devices came close to the theoretical maximums for their lesser number. of interfaces.
VendorGoodput in Mbps
No device (baseline, four interfaces)3,804
Array (one interface)963
Citrix (two interfaces)1,900
Crescendo (four interfaces)3,244
F5 (four interfaces)2,594
Foundry (four interfaces)1,588
Juniper, one system (one interface)629
Juniper, four systems (four interfaces)2,120

None of the devices came close to the baseline measurement, but there are some mitigating circumstances. The Array, Citrix and Juniper single-box devices had fewer than four client and four server interfaces; thus, they could not achieve the same data rates as in our back-to-back tests. The Array and single-box Juniper entries each had one client interface; the Citrix device had two. Both the Array and Citrix results are much better than they look in comparison with four-interface boxes; both vendors' devices ran at the maximum rate possible given their interface counts.

Even so, we're a bit surprised that the devices didn't come closer to our baseline. The fastest box was Crescendo's CN-5080E, the only device to crack the 3Gbps line, because of its use of hardware acceleration for packet forwarding.

F5's BIG-IP was next fastest, with a goodput of about 2.5Gbps. The vendor says the BIG-IP's goodput partly depends on the number of clients it sees. F5 reran our tests internally with 1,000 and 10,000 users, and says it achieved goodput of as much as 3.2Gbps. We did not attempt to replicate this, but it does suggest that different user counts could affect goodput. (On the other hand, goodput topped out at about 25 to 30 users in our baseline tests with no device in line; any greater number of users had no effect, because the network was already saturated.)

Foundry's ServerIron 450 moved traffic at around 1.9Gbps, or about half the theoretical maximum possible, a result Foundry replicated internally. Unfortunately, our production schedule did not allow time for further examination. Earlier builds of ServerIron code delivered goodput of at least 2.7Gbps, and the vendor says it has achieved rates in excess of 3Gbps in its internal tests.

Juniper's four-box setup was the slowest of the systems with four interfaces. The vendor was not surprised by this result and says it considers other acceleration features, such as caching and compression, far more important. We agree.

Goodput is a useful way to describe system capacity, but given that few if any users run their networks at maximum utilization, forwarding rates are probably less important than results of other tests.

Maximum forwarding rate“Goodput” describes application-layer performance by measuring forwarding rate, minus any data lost or retransmitted. Crescendo posted the highest goodput of any device tested, but results for the Array and Citrix systems also are noteworthy, in that both devices came close to the theoretical maximums for their lesser number. of interfaces.
VendorGoodput in Mbps
No device (baseline, four interfaces)3,804
Array (one interface)963
Citrix (two interfaces)1,900
Crescendo (four interfaces)3,244
F5 (four interfaces)2,594
Foundry (four interfaces)1,588
Juniper, one system (one interface)629
Juniper, four systems (four interfaces)2,120

< Previous test: Services | Next: One size doesn't fit all >

Learn more about this topic

Getting scalability, high availability from SSL VPN wares 12/19/05

Review

Businesses follow carriers' MPLS lead

05/23/05

Thinking small 04/12/05

Opinion

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:

Copyright © 2006 IDG Communications, Inc.

IT Salary Survey: The results are in