Search /
Docfinder:
Advanced search  |  Help  |  Site map
RESEARCH CENTERS
SITE RESOURCES
Click for Layer 8! No, really, click NOW!
Networking for Small Business
TODAY'S NEWS
Valentine's Day Patch Tuesday: Microsoft to issue 9 patches, 4 critical
Mobile World Congress sneak peek: Quad-core smartphones, Ice Cream Sandwich & more
Microsoft details 'Windows on ARM' program
March debut of 'iPad 3' a sure bet, says analyst
FBI unbolts Steve Jobs 1991 investigation file
Cisco boosted profit, sales in Q2 while cutting costs
Macs take on the enterprise
Four crazy tech ideas from Google's Solve for X project
Obama 2012 campaign playlist revealed courtesy of Spotify
Oracle buying Taleo for US$1.9 billion in direct hit at SAP
Amazon attacks Apple: You get 3 Kindle products for price of iPad 2
Pre-rendered pages highlight latest Google Chrome release
Microsoft exec: Lync-Skype integration a 'compelling opportunity'
The future of hypervisors
/

Pure hardware VPNs rule high-availability tests

Nokia tops the lot in reliability and load-balancing performance.


High availability means never having to say, "I'm sorry, but the network is down."

As VPNs make their way into corporate networks, high-availability VPN hardware and software is slowly becoming available. Our torture tests show that while a number of high-availability VPN products are on the market, few really meet the needs of enterprise and application service provider data centers.


Virtual Private Networks: Viable Products Now
User study: VPN services save money despite stalled implementation
High availability's dark side
Buyer's Guide: Interactive database
How we did it
Interactive Scorecard and NetResults
Archive of Network World reviews
Subscribe to the Product Review e-mail newsletter


Standing alone as a survivor of our rigorous test bed is Nokia's CryptoCluster 2500, offering the most amazing high-availability and load-balancing feature set we saw, thereby earning it the Network World World Class Award. With subsecond failover and active load balancing, CryptoCluster came closer to high-availability nirvana than any of the other six products we tested. When it came to pure VPN performance in a high-reliability, high-bandwidth environment, Nokia left everyone else in the dust.

Next in line was NetScreen Technologies' NetScreen-100. While it couldn't match Nokia's reliability statistics and active load-balancing performance, it brought something else to the table: a stateful firewall. For network managers who want to combine their VPNs and firewalls into a single box, NetScreen's 1.75 inch-high NetScreen-100 is an excellent choice.

Radguard, a Network World Blue Ribbon winner in our last VPN showdown, did well in our tests with its cIPro 5000, but was held back only slightly by a slower reaction time at high loads and the failure of an important failover test. Alcatel (you may know it better as Timestep, which was bought by Newbridge Networks, which was then bought by Alcatel), another Blue Ribbon winner in years past, also looked great in our tests with its Alcatel 7137 SVG - until the load got high, at which point we found some instabilities in its failover code.

Products from Rainfinity, Stonesoft and Foundry did well in many of our tests, but we wished for the simplicity and reliability of Nokia's CryptoCluster 2500 as we analyzed the competition.

Alcatel, Nokia, Radguard and NetScreen sent us pure hardware VPN products, which gave us high-availability and high-performance VPNs in a tiny package. Nokia's CryptoCluster 2500, Radguard's cIPro 5000, Alcatel's 7137 SVG and NetScreen's NetScreen-100 all showed up as small (1.75 inches high) packages with crypto power to burn. We didn't test specifically for cryptography performance, but our high-availability tests ran up to 40M bit/sec of encryption, and none of these boxes broke a sweat supporting that.

Three companies came in with high-availability products based on Check Point's FireWall-1/VPN-1 software product line. Check Point prefers to provide high-availability VPN packages with its partners, and worked with Rainfinity, Stonesoft and Foundry to set us up with combination hardware/software or software/software bundles for this test.

Although Check Point sells an OEM cryptography accelerator board from Chrysalis called the Luna VPN, we did not receive an accelerator to use in the test from any of the participating Check Point partners. This limited the speed we could test the failover/high-availability features in Check Point-based products to between 6M bit/sec and 8M bit/sec, which are the practical limits for nonaccelerated cryptography with Intel or scalable processor architecture (SPARC) platforms and VPN-1 software.

Rainfinity's RainWall 1.5 and Stonesoft's StoneBeat FullCluster 2.0 are software-based products that sit on the same system as the Check Point VPN-1 software and manage load balancing and high availability. Foundry sent two of its ServerIron XL systems (running Version 7.1 of its operating system), which form a high-availability sandwich around the Check Point VPN-1 systems.

We believe that Rainfinity, Stonesoft and Foundry started out from a weaker position because of the number and cost of the components in their configurations. The recommended Sun E450 server with two 450-MHz CPUs aren't cheap (list price is more than $40,000 for each, and you need two of them), plus the Check Point FireWall-1/VPN-1 license (typically $10,000, and you also need two of them) and VPN accelerator card (list price is about $4,000, and you need two) all add up. You're at more than $100,000 before you even add Rainfinity, Stonesoft or Foundry components, which add between $12,000 and $16,000, into the mix.

However, Check Point's dominance of the firewall and VPN industry (it claims to have 52% of the VPN market) means that many companies may be adding high availability to an existing firewall or VPN environment in which many of these costs are already sunk. Large companies that have standardized on the Check Point management style may also regard price differences as small change in the big picture. If the power, reliability and expandability of the Sun E450 server isn't required in your network, an Intel server with Windows NT and two CPUs would be about one-tenth of the price of the Sun SPARC system.

What is a high-availability VPN?

As we analyzed the products in our lab, we discovered that anyone building a high-availability VPN is going to make some serious compromises. No single product has the complete answer, and no two vendors looked at building high-availability VPNs in the same way. We also discovered that high availability is often combined with load-balancing features, sometimes intentionally and sometimes as an afterthought.

Our definition of a high-availability VPN assumed that the main goal was reliability in the face of component failure.

Component failure is different from data center failure. What happens if the entire computer room goes dark? In that case, a whole different kind of high-availability is needed.

We didn't test these features, but several vendors (including Radguard and some of the Check Point-based vendors) offer high-availability options for environments in which multiple data centers are involved. In Radguard's case, its high-availability option does not differentiate between collocated and distant VPN gateways. For everyone else, distant VPN gateways (called "multiple entry points" in Check Point's vocabulary) require a different configuration.

The mixture of high availability and load balancing is common, but the kinds of load sharing differ among vendors. One reason load sharing is desirable in a high-availability situation is that it reduces the total amount of load that has to be flopped from one gateway to another in the case of system failure. This was particularly visible in the Nokia product, in which subsecond failover was facilitated by the fact that only half the load had to be "failed" whenever we created a problem.

Load sharing is advertised as a benefit of the Check Point partners, but none of them did it well. Our intent was not to specifically test load sharing, but we created heavy loads using 400 IPSec Security Associations with evenly distributed traffic. We expected that any product that advertises load sharing would pass this traffic among the high-availability nodes. What we observed, however, was that little or no load sharing occurred with the Stonesoft, Rainfinity or Foundry products.

There are three possible explanations for this. One is that although our load was high from a CPU point of view (because we were pegging the systems trying to get all the encryption done), it wasn't high from a traffic point of view, and so load balancing didn't come into play - after all, a few megabits of traffic through a huge Sun server isn't worth balancing if you're just a firewall.

A second possibility is that the particular traffic profile we used was not amenable to load balancing. If load balancing was based on the addresses of the two gateways, and not on the number of Security Associations between them, then a load balancer would not have seen our traffic as "balanceable" because we used only a single virtual gateway to create the load. This is perhaps a reasonable assumption on the part of the load balancers (because our traffic profile was slightly atypical), but there's no reason not to balance traffic at the Security Association level. For high-traffic, site-to-site VPNs, this kind of load balancing would be required for an effective system.

The third possibility is the load balancers don't work correctly. Because we didn't focus on load balancing in this review, we didn't quiz all the vendors on this result. However, if you care about load balancing, you should look carefully at how the load balancing works - whether at the IP address level, or with a finer grain, at the IP Sec level.

One surprise was the lack of high-availability hardware features. We found dual power supplies and some hot-swap capabilities in the Sun E450 servers that went along with products from Rainfinity, Stonesoft and Foundry. Foundry's ServerIron XL can be equipped with optional redundant power supplies but wasn't for our tests. Generally the hardware stands by itself. One reason for this may have been the reliability of today's systems. We popped the tops on the boxes from Nokia, NetScreen, Radguard and Alcatel and discovered similar hardware designs: little or no bus traffic, few connectors, cool-running components (that need little or no cooling) and low component counts. Several participants told us they didn't feel the need for extra hardware, citing reduced reliability that accompanies additional complexity. However, simple changes such as multiple power cords could guard against common human errors. NetScreen offered to send us a pair of its $110,000, Gigabit-capable VPN systems with a bevy of high-reliability hardware features, but we wanted to compare similar systems.

How fast and well does it work?

Our test lab was designed to answer how well does high availability work? The answer, with one exception, is "not terribly well."

By the time we finished testing, we found flaws in virtually every product we tested. There was a clear winner in Nokia's CryptoCluster 2500, which passed all of our high availability tests and also outperformed the nearest competition by a factor of 10 to 1. However, Nokia has its own problems as the CryptoCluster doesn't do a lick of firewall functions, and customers we talked to were uncertain about how Nokia plans to integrate its Check Point-oriented IPSO firewall and VPN products with the CryptoCluster line.

Still, when it comes to high availability, Nokia beat the competition into the ground with astonishingly and unbelievably good performance. When one of the participants in this review asked us, "What would it take to be No. 1 in this review?" our answer was "subsecond failover." Nokia's CryptoCluster 2500 does it in less than 1 second even at encryption loads up to 40M bit/sec.

Our simplest tests were the link failure and power failure tests: Someone unplugs a patch cord or power cable. Because link failure causes a detectable change, it's easy to send a message to the other cluster member (on the still working interface) to say, "I'm broken." The same is true for power failure: The other guy just goes dead. All the products we looked at passed these tests easily. After the 0.2% loss of packets from Nokia's CryptoCluster 2500 on those tests, Alcatel's 7137 SVG and Stonesoft's StoneBeat FullCluster 2.0 lost between 5% and 7%, while most of the other products were in the 8% to 10% range. The slowest to recover was Foundry's ServerIron XL, with a 13% loss.

Our connectivity test was one of the toughest tests we conducted for high availability. We simulated a total of three failures other than the obvious, and only Nokia's CryptoCluster 2500, NetScreen's NetScreen-100 and Stonesoft's StoneBeat FullCluster 2.0 could handle them all. For example, we looked at what happens if the cables of one of the systems was swapped, perhaps by a misprogrammed or failing LAN switch. Failover for CryptoCluster 2500 was still less than 1% packet loss, while the NetScreen-100 and the StoneBeat FullCluster 2.0 lost 14% and 16%, respectively.

Of the three products that passed all connectivity tests, StoneBeat FullCluster 2.0 deserves special mention because of its "Test Subsystem." The product includes a test script system that network managers can use to devise their own rules for what constitutes a failure. We were disappointed that the default set of scripts couldn't detect connectivity failures - even though the tools Stonesoft provides let you easily detect that kind of failure. Only StoneBeat FullCluster 2.0 gives you the tools to build a really elaborate failure detection system, but a poorly chosen set of defaults and documentation advice makes network managers work harder than they should have to.

Although the NetScreen-100 handled the connectivity failure tests admirably, it failed one place in which no other product did: the failback test. In the failback test, we looked at how well a system that had a high-availability failure (such as a system reboot or bad patch cable) was able to quickly accommodate a second high-availability failure.

This isn't as unusual as you might imagine.When someone mispatches a cable, the subsequent repair job or hasty "What's wrong with this?" debugging often causes additional failures in short order. Products that passed the failback tests did so with the same loss rate and failover time for their original failover tests. We didn't see any differences.

The NetScreen-100's failback performance wasn't entirely surprising, as this version of its high-availability code was so new that the bits were still warm when we tested it.

While the NetScreen-100 was the only product that failed the failback tests outright, these exercises still stressed out products from most of the vendors. StoneBeat FullCluster 2.0, RainWall 1.5 and ServerIron XL depended on the underlying Solaris operating system for their VPN servers, and a clean boot of Solaris from power on to packet passing took more than 5 minutes on the Sun E450 systems.

This means that if you have a power failure in one of your Solaris systems, getting ready for the next event is going to take at least 5 minutes. If the power failure happens and your disk is corrupted - not uncommon in Unix - it could take hours.

The RainWall 1.5 also had some suspicious results in which it was unable to complete some runs of our failback tests. When we tried to reproduce the problems with Radguard technical support on the phone, everything worked fine. Although we were eventually able to run the test bank without problems, some of our early runs were timed a little too closely and caused total VPN failure.

Another high-availability issue we investigated was the impact of load on failover time (see failover results

graphic, below). Although our 400 IPSec Security Associations comprised a fairly modest load for an enterprise VPN device, the bandwidth requirements pushed these systems pretty hard. Radguard's cIPro 5000 showed the stress of this load quickly, increasing its failover time with each jump in load from about 12% traffic loss with a 2M bit/sec data load to less than 30% traffic loss with a 40M bit/sec data load.

Of even greater concern was Check Point's performance. As part of Check Point's proprietary failover strategy, its products renegotiate its IPSec Security Associations when moving from one system to another. Unfortunately, that heavily affects its ability to fail over large numbers of Security Associations. We saw this in all of the high-availability partners, even though some were unaware that Check Point was doing it. We measured a time of 25 to 30 milliseconds per Security Association for failover, which is what dominated the time measurements for Check Point's partners to failover from one system to another. Those milliseconds also add up: A gateway with 10,000 users attached would take more than 5 minutes to recover.

On the positive side, we found that CryptoCluster 2500 and NetScreen-100 had little or no significant difference in traffic loss as load increased. Alcatel's 7137 SVG hardware had the same characteristics until we hit the wall at 40M bit/sec. For those systems, failover seems to take the same amount of time at virtually any speed.

Picking your HA option

Overall, we found ourselves preferring the hardware-based products over software-based products for three reasons. First, the software offerings all depended on more complex and physically larger configurations. That reduces total system reliability - not a goal of this high-availability system. While Sun has a reputation for reliable equipment, it's difficult for a product that combines hard drives, power supplies, fans, a general purpose operating system and software from multiple vendors to match the reliability of a single-vendor product in a small package. The more components, the greater the likelihood of component failure. This doesn't necessarily translate into system failure, but it makes life more difficult for the network manager trying to design a reliable system.

A second reason for our hardware-over-software preference was management complexity. The software-based products, as well as Foundry's Server Iron XL, required operating system installation and support, VPN-1 installation and support, and installation and support of the high-availability package. When we contrasted the amount of time and effort to build those products against the few minutes it took us to get Alcatel's 7137 SVG and Radguard's cIPro 5000 up and running, it seemed fairly extreme.

However, we were looking exclusively at the VPN side of the house - if you want a firewall, or already have a firewall, you might have the Solaris/Windows NT and Firewall-1 part already licked. ServerIron XL has a similar edge as the devices we looked at ranged from eight ports of 10/100 up to 24 ports with Gigabit Ethernet capabilities, doing much more than just handling high availability for VPNs.

Finally, we were just astonished at the pricing differences. CryptoCluster 2500 and NetScreen-100 seemed inexpensive at $20,000 compared to RainWall 1.5, StoneBeat FullCluster 2.0, and ServerIron XL costing five times as much. Although there is a certain element of you get what you pay for - Check Point's firewall capabilities are a hands-down winner over Radguard or NetScreen's firewalls - the VPN side doesn't offer the same advantages.

RELATED LINKS

Snyder is a senior partner at Opus One, in Tucson, Ariz., specializing in messaging and security products. He can be reached at joel.snyder@opus1.com. Elliott is also a parter at Opus One. He can be reached at chelliott@opus1.com.

User study: VPN services save money despite stalled implementation
Three users describe the pros and cons of VPN service deployments.

High availability's dark side
The problem with high availability is that it generally tries to make more than one system look like a single box: one IP address, one media access control address.

Virtual Private Networks: Viable Products Now
Vendor consolidation, better price/performance and new enterprise features mean good things for your network.

Buyer's Guide: Interactive database
Our database includes VPN products from 23 vendors.

How we did it
Our testing methods explained.

Interactive Scorecard and NetResults
Use our calculator to see which VPN product would best suite your network needs.

Archive of Network World reviews

Subscribe to the Product Review e-mail newsletter


NWFusion offers more than 40 FREE technology-specific email newsletters in key network technology areas such as NSM, VPNs, Convergence, Security and more.
Click here to sign up!
New Event - WANs: Optimizing Your Network Now.
Hear from the experts about the innovations that are already starting to shake up the WAN world. Free Network World Technology Tour and Expo in Dallas, San Francisco, Washington DC, and New York.
Attend FREE
Your FREE Network World subscription will also include breaking news and information on wireless, storage, infrastructure, carriers and SPs, enterprise applications, videoconferencing, plus product reviews, technology insiders, management surveys and technology updates - GET IT NOW.