• United States
by Rodney Thayer and David Newman

Why we didn’t test performance

Feb 16, 20047 mins

Our intrusion-protection system review methodology was open for comment for four months before testing began. We openly solicited vendor input regarding what to test and how to test it. Three vendors pushed hard to include performance testing.

The problem was, all three vendors suggested very different methodologies. The basic metrics – throughput and latency – were the same. And we agreed with these points because if IPS vendors want network professionals to put these devices in-line in production networks, the devices have to act as fast and as reliable as the switches and routers they replace.

But there was no agreement beyond that because IPSs differ in some of their most basic characteristics. Some appear on the network as hubs, others look like switches, and some operate as routers. Performance tests for Layer 2 switches and hubs are very different from those for Layer 3 routers.

Let’s suppose for the moment that we tested IPSs the way we test Layer 2 or Layer 3 devices. It is possible to get valid numbers on throughput, latency, jitter and the like. The problem with such measurements is that they’d tell us absolutely nothing about the way these systems behave as IPSs. Performance tests don’t measure security. They might not even measure performance in a meaningful way. After all, system behavior might differ radically when we configure IPSs for Layer 7 inspection rather than for Layer 2/3 forwarding.

Additionally, different vendors’ IPS devices go into different places in the network. A few are meant to sit at the absolute outside edge, right next to an Internet-connected firewall. Others are more generic, engineered to go closer to the core or right in front of some set of protected hosts or subnets.

It’s not rational to compare a device designed to support a DS3 Internet circuit with one engineered to replace a core 100M bit/sec or even 1G bit/sec switch.

The biggest roadblock to comparable, repeatable performance statistics was that no two vendors agree on the definition of IPS. The most basic test would be to simply push traffic through these devices and see how they behaved. That might have been repeatable, but it wouldn’t have been useful. We don’t care how these devices behave when they’re simply passing traffic; we care about how they behave in the presence of attacks.

And that brings up two more difficult performance questions: What traffic should be passed? And what attacks should be blocked? As attacks begin to build up through an IPS, you can expect different kinds of behavior. The IPS might choose to drop packets randomly to protect the systems or networks behind it. Or it might delay packets, hoping to deter a SYN flood. Because putting attacks onto an IPS is going to cause it to vary its characteristics, the question again arises: What exactly would we be measuring?

If we expect the IPS to differentiate between good and bad traffic, we’d have to first define what good and bad are. Different products by design define those parameters differently, depending on what kind of attack they’re seeing.

One test option is to send only good traffic through the IPS, but configure it to look for bad traffic. The IPS wouldn’t drop any packets because it was seeing attacks, but it would still have to look for malicious traffic. This would give us performance numbers for the IPS when all is clear, a “best-case” scenario. But the results here would be highly configuration-dependent and could have radically different results depending on what kind of good traffic was passing through the IPS. Is this traffic the IPS can quickly dismiss without deeply analyzing it? Or does it have to keep statistics and counters in case the traffic does trigger a response? And which level of bad traffic detection will be enabled? In all cases, performance would vary even within the same product as you change the configuration. Even picking the traffic load and characteristics will be critical, as products will behave very differently even in the presence of good traffic.

This best-case scenario, though, doesn’t give us information about what happens when an attack does occur. So any reasonable IPS performance test must include runs that mix both good traffic and bad traffic. The results are even more difficult to compile than our first, best-case scenario.

For example, let’s suppose we were to compare the Econet IPS (a content-based service) against the Captus IPS (a rate-based device). We’d push some traffic through them and then launch a SYN flood. The Captus system will start to drop some packets – if it has been configured to respond to a SYN flood. But which of Captus’ four different attack response mechanisms will we set in motion? Do we cap throughput or just stop sending traffic through? Many options exist – but which are fair, repeatable and comparable?

The Econet IPS doesn’t look for SYN floods, so it happily will send all packets through. What can we conclude about this picture? That Econet is faster than Captus, because it doesn’t drop packets? Or that Captus is better than Econet because the latter might not have caught the attack?

OK, you say, not fair. Let’s instead compare Captus and Top Layer, two of our rate-based IPS devices. Configuration is the issue here. Do we have them just count packets and tell us what they would have done, if they were going to do it? Or should we set them to drop packets or delay connections? Measuring throughput and latency assumes a zero-loss environment. How do you measure performance when the device is supposed to drop packets? As the IPS’ behaviors changes, so will the response times. There is no set of behaviors you can set up for these two devices where they will respond in the same way. We’re back to simply sending packets through.

Let’s turn to the content-based products. There, fair performance comparison is even more problematic. Let’s take Internet Security Systems (ISS) and Lucid Security, for example. ISS has several thousand signatures in its database, but only about 250 are turned on for IPS purposes. Lucid has a similarly sized signature database, but doesn’t have any signatures turned on. It relies on scanning the network to enable the signatures that pose threats to the network it is protecting. So which signatures do we turn on when we try to compare these products? Clearly, if there are a lot of signatures enabled, this will affect performance dramatically. What, then, is the right mix to fairly compare these products?

Moving past signatures, we must determine what traffic mix to offer these products. Obviously, some packets should be detected as attacks. But because we are dealing with different architectures here, we have to take into account that ISS is going to drop a packet detected as an attack while Lucid adds a rule to the firewall to block the next packet from that source. Does Lucid fail the test because it let that packet through?

We are not asserting that all IPS benchmarking is wrong or useless. Certainly each IPS vendor should be setting up benchmarks for its product to understand how it behaves throughout development cycles.

While we’ve outlined the roadblocks to testing IPS performance, we’re not giving up. In 2004, we hope to revisit at least a subset of the vendors in this review, with an eye toward measuring performance. Network professionals have a reasonable question: Can I put this in my network without breaking it? We’ll figure out how to test that and let you know.