How we tested UTM firewalls

We invited all major firewall vendors to participate in this Enterprise UTM Firewall test last June. To prepare for the test, we wrote a test methodology, which we circulated to enterprise network managers, other Network World testers and some contacts in the vendor community. Based on their feedback, we constructed a final test plan (.pdf) that accompanied the invitation.

For each set of devices, we used a combination of commercial test tools from Spirent and Mu Security, standard electrical-engineering measurement products, as well as our own custom-written tests to evaluate the products in 10 categories. We plugged each device into an infrastructure that included a core 10/100/1000 Ethernet switch from Enterasys Networks, KVM switching devices from Avocent and Intel-based servers running VMware server.

Performance testing

To test performance, we used Spirent’s WebAvalanche and WebReflector tools to generate HTTP traffic across the firewalls. We set up a profile using a typical Internet mix of traffic, ranging in size from 1KB objects to 1.5MB objects, and ran HTTP transactions through the firewalls at a rate designed to bring the load on the firewalls up to a level of 1Gbps.

Although seven of the firewalls we tested handled the 1 Gbps rate (without any UTM features enabled), we decided not to test beyond 1Gbps in this test for two reasons. First, we had asked for devices sized to 1Gbps, and we didn’t want to turn this test into a raw performance run. Secondly, when we turned on UTM features, we discovered that none of the products could hold the 1Gbps any more, so we were comfortable testing only up to 1Gbps in a UTM focused test.

We ran through many different performance scenarios designed to discover the different speeds of each device.

IPS testing

To test the IPS of each device, we turned to the Mu-4000 Security Analyzer appliance, an attack generation and reporting tool, from Mu Security. For the Mu-4000 testing, we focused on published vulnerability attacks. We broke up our testing into two directions: client to server, and server to client, as an IPS is generally either protecting users or servers, but seldom both at the same time. 

In the user case, the IPS is programmed to protect users who are browsing the Internet or downloading files, and thus, are susceptible to certain types of attacks focused on client applications, such as Web browsers and PDF readers. In the server case, the IPS is programmed differently, protecting Web, e-mail and other types of servers against attacks initiated by malicious users.

We followed vendor guidance to set up “server protective” and “client protective” IPS profiles. Then we tested each profile using the Mu-4000 to see the percentage of attacks blocked by the IPS. The client profile had about 400 attacks, while the server profile had about 500.

Antivirus testing

For antivirus testing, we were not so concerned with validating that the firewalls caught a particular virus, but that they were seeing and blocking viruses across a wide range of ports. We took 15 recent (July 2007) viruses and packaged them using four vectors: e-mail via SMTP, FTP, HTTP on Port 80 and HTTP again on a nonstandard port. We used a client to transfer the viruses across the firewalls and logged the results. 

In many cases, the viruses were completely blocked, but in some cases a file was transferred. We compared the transferred file with the original virus to see whether the transferred file had been “defanged” by the UTM firewall. If the firewall defanged the virus or blocked it completely, we considered the virus to be “blocked” in our testing.

We did not make a special effort to tell the UTM firewall about the nonstandard HTTP port, instead assuming that the enterprise had a broad “allow outbound connects” policy. Many enterprises do not have a wide open policy like the one we used, and those enterprises would likely see a higher virus catch rate than we found with the appliances.

High-availability testing

To test high availability, we used each vendor’s guidelines to set up a high-availability pair of UTM firewalls. In most cases, we used active/passive high-availability testing, unless the vendor specifically asked us to do load sharing testing. Normally, in a dual-appliance configuration, you would do active/passive high availability to make sure that in the event of a system failure, you would still have sufficient capacity to cover the offered load.

Although there are many different scenarios for high-availability testing, we stuck with one of the simplest and most common: an abrupt power-cord-attenuation event in which one device was powered down without warning. To measure how well the devices recovered during a high-availability event, we put the cluster of two devices under a load using the Avalanche/Reflector configuration, and then measured how long the remaining device took to return to full-speed performance. We also counted how many sessions were lost during the high-availability event, an indication of incomplete state sharing between the devices. The Avalanche/Reflector statistics are reported in four-second intervals, so the precision of our results is +/- 4 seconds.

Dynamic-routing testing

To test dynamic routing, we focused on OSPF, the most common intraenterprise routing protocol. We set up a configuration where the UTM firewall learned its default route from two different Cisco IOS routers, simulating an environment where two routers have separate connections to the Internet. We gracefully brought down the OSPF session on one of the IOS devices and checked (by running continuous pings through the UTM under test) to see whether the UTM firewall picked up the route from the other IOS device. Because OSPF wasn’t tuned to be supersensitive to outages, we considered a failover within 30 seconds to be typical and acceptable.

IPv6 testing

For IPv6, we did not run specific tests. Instead, we used the product’s own documentation and GUI to evaluate the level of IPv6 support in each product. In some cases, we asked the vendor to provide more information on IPv6 support.

Hardware evaluation

To evaluate hardware, we looked at the hardware provided by the vendor, along with our tests integrating the devices into our test bed. Our power-consumption tests were measured using standard lab meters, once with the devices unloaded 30 minutes after boot and again during a stressful performance test.

Management evaluation

In addition to covering the policy and centralized management tools generally, we looked in three key areas: firewall policy definition, NAT policy definition and VPN definition.

Some policy definitions, such as those pertaining to IPS and antivirus definition, were evaluated separately in the antivirus and IPS sections of this test report. To evaluate firewall and NAT policy, we tried to install our simulated enterprise policy on each device. Our policy included the assumption that the device was being used as more than just a perimeter firewall and had rules designed to protect (and NAT) different parts of an organization from each other. For the VPN policies, we evaluated (but did not test) the ability to generate enterprise-style site-to-site VPNs between many different devices, as well as the ability to use the same firewall for user remote access.

< Return to main test

Learn more about this topic

Buyer's Guide: Unified threat management

Are stand-alone IPSs dead? 09/26/07


Five tips on deploying enterprise UTM


Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10