Digging deeper into the accuracy and performance data.
Key: Services, Appliances, Software
False positives: The best scores in our test all reflect products that have gotten the science of not tagging legitimate mail as spam down to the noise level. Any false positive is a problem, and non-delivery receipts (NDR) and mailing lists caused the most problems for the anti-spam products. Many mailing lists might be unimportant, but some are critical. The same is true for NDRs. If you send a mail and it doesn’t go through, your only clue is the NDR coming back from your own mail system or, sometimes, the other end. Anti-spam packages that filter these out a little too zealously (because they assume that most NDR messages are the result of a mass-mailing worm), which we found in many of the products we tested, break that feedback loop and make mail less reliable.
In last year’s test, false-positive rates were much higher, and we said a quarantine was a critical requirement. This year, while the false-positive rate has dropped overall, we still think that most businesses using e-mail as a critical communications tool need some way to deal with false positives.
Tuning: Many vendors insisted they would do better on false positives with better tuning. It’s a good argument, but the top scores in our test came from products such as Sophos, Symantec, Advascan and Proofpoint, which required no tuning whatsoever. In the top 10 false-positive scores, only CipherTrust did any tuning before going into production. The idea that an anti-spam solution requires constant maintenance and updating might have been true before, but our tests seem to indicate that outstanding performance is possible with products that require no tuning at all. In some cases, such as Symantec’s engine, tuning isn’t even allowed.
False negatives: We asked vendors to provide settings for the lowest false-positive rate possible, and that trade-off between catching spam and making mistakes was very clear. Some vendors, especially service providers Netriplex and 0Spam.Net, got very high spam catch rates, letting in only a handful of messages. But this came at an unacceptably high cost, with hundreds of false positives. The best balance came from service provider Postini, which had a 97% spam catch rate and only six false positives.
Some products with high catch rates and high false-positive rates could be tuned. While 99% spam-catcher 0Spam.Net has no knobs and can’t improve a dismal 5% false-positive rate, service provider Netriplex, along with software vendors Process Software and Vircom, offer dozens of adjustments that can be used to drop the false-positive rate while keeping the spam catch rate at the 98% to 99% level we saw.
(MS=S): A few products include the “maybe spam” concept. We computed two scores for these products, one counting “maybe spam” as spam, and the other score considered the “maybe spam” designation as not spam. If you don’t consider tagged “maybe spam” as a false negative, CipherTrust and MailFrontier’s rankings improve considerably. Two vendors with a “maybe spam” ranking, BorderWare and Symantec, don’t do any better — they catch more spam, but don’t change their false-positive rate. Because BorderWare came in with the Brightmail anti-spam engine for this test, the similar performance is not surprising.
MailFrontier: An error on our part during installation prevented MailFrontier from properly completing the test. We could re-queue the mail using a spam signature set that was accurate as of the end of the test. Thus, it is likely that MailFrontier had a higher spam catch rate than it would have if the mail had run through contemporaneously.
RR: An improperly selected configuration option on our part caused us to have to re-run mail through Process Software’s PreciseMail. The correct and incorrect numbers are reported, with (RR) marking the re-run.
CipherTrust: Ciphertrust’s IronMail was shut down by the company’s technical support team during the test, so it saw only approximately one-third of the mail flow. Its performance is likely representative, but has a higher margin of error than the other vendors presented.
Appliances: For the appliance vendors, the throughput we report should be considered as a worst case scenario because more than half of mail is spam and will not have to flow through the entire system. However, we did not test with quarantine or virus scanning, and both of those features, if used, would further reduce system performance.
Software caveats: For vendors that sent software, we used a VMware ESX server virtual system with a fairly limited disk subsystem to accommodate the huge number of vendors that wanted to participate in this test. Performance on these products would likely be higher (see “Adventures in spam testing,” page 36).
The Unix factor: We spent more time tuning Unix, Sendmail and various Unix system utilities than we did tuning products from vendors that ran on Sendmail, including Roaring Penguin, Privacy Networks, Proofpoint and Cloudmark. In some cases, the differences were dramatic. A single-line change in Sendmail configuration, for example, tripled the throughput of Roaring Penguin’s CanIt Software. This means companies that install their own software, rather than going with an appliance, need to be prepared for significant performance tuning.
VMware variation: To measure how much slower our VMware system would be than bare hardware, we ran Cloudmark through the paces three different ways: once on VMware, a second time with the exact same configuration on the same hardware but without VMware, and a third time with a similar server tuned by Cloudmark for our testing. We found that message throughput for VMware is between 20% and 30% as fast as it would be in bare hardware. For example, Cloudmark’s own server ran a very peppy 5.3 times faster than our VMware system. When interpreting the performance numbers, it’s best to compare appliances to appliances and VMware to VMware for relative speeds.
Accept rate vs. delivery rate: We measured the rate at which products accepted mail, and how long it took them to deliver it after scanning. In some cases, products accept mail more quickly than they can deliver it. Whether this is good depends on the details. If a product accepts mail faster than it can deliver it, it has to flow-control the incoming mail — slow down how fast you will accept it — at some point. Products that don’t flow-control are susceptible to a denial-of-service attack because someone can fill up your disks and lock up your server. Our test wasn’t long enough to show which products flow-control under load. For example, consider a product with 140G bytes of queue space that accepts messages at a rate of 10 messages per second, but only delivers at half that speed. It would take three hours at full throttle before you ran out of disk space.
Products that only accept mail as fast as they can scan or deliver don’t deal with mail volume peaks very well. The best strategy is to accept mail at a faster rate than you can scan it up to some point, then start slowing down senders as resources are consumed. Of course, some products architecturally don’t work that way — they either scan as the mail flows in or they are SMTP proxies, and the flow control has to come from the destination mail server. Most products we tested accept the mail with an SMTP server such as Sendmail or qmail, then scan the message, and then place it into a queue for delivery.
Final note: We believe that actual spam catch rates would be higher for a test in which the anti-spam products directly faced the Internet. False-positive rates would generally not be affected.