- How to make new stuff from your piles of obsolete tech
- Why your computer sucks
- 10 recession-proof IT skills
- Juniper execs share network vision
- 9-year-old plots his fifth Microsoft certification
Say false positive, and you immediately dive into a tough world - statistics of diagnostic tests. The terms false positive and false negative (and their cousins, true positive and true negative) are fairly easy to define. But turning the number of false positives and false negatives into easy-to-digest statistics is different, because the anti-spam community has not come to any agreement on which numbers to use across products.
A spam filter is a diagnostic test. For some set of thresholds, it will say "this is spam" or "this is not spam." In our testing, we didn't expose those thresholds. Instead, we asked the vendors to pick thresholds such that the false-positive rate would be kept to less than 1%. Interestingly enough, none of the vendors asked what we meant when we asked for false-positive rate. Based on your tolerance for false negatives (spam in your mailbox) or false positives (mail mismarked as spam, lost or delayed), you might want to set these thresholds differently.
Four main statistics are used to describe diagnostic tests. Positive predictive value (PPV) and negative predictive value (NPV) go together. They measure how likely the test is to be correct. PPV measures the probability that a message actually is spam, given that the test says that it is. PPV is computed by dividing the number of true positives by the sum of true positives and false positives. However, PPV doesn't say how much spam will be filtered out: The number of missed spam doesn't figure into that statistic at all.
Sensitivity and specificity are the other two statistics, sometimes called the true positive rate and true negative rate. They measure how likely a test is to catch whatever is being tested. Sensitivity, for example, measures the probability that a message will test as spam, given that it actually is spam. Sensitivity is computed by dividing the number of true positives by the sum of true positives and false negatives. Most research on diagnostic tests uses PPV and NPV or sensitivity and specificity to describe how well a test works because these are well-defined statistics.
The term false-positive rate is, unfortunately, not commonly defined or agreed on. For some people, the false-positive rate is the proportion of those cases that test positive but that are actually not spam. That is, it's the complement of the PPV. For others, false-positive rate is the proportion of the total sample (i.e., all mail messages) that is not spam, but test positive as spam. That is, it's the complement of the relative specificity. Rather than pick an ambiguous definition, we focused on things that made sense in the world of spam and didn't overlap each other in definition.
In thinking about anti-spam software, network managers will be concerned with two main questions. The first is "How much spam will this filter out?" The sensitivity statistic best answers that question. It tells us what percentage of the time the filter will identify spam. A perfect score would be 100%. In our sample, there were 7,840 spam messages. MailFrontier's Anti-Spam Gateway (ASG) caught 7,005 of those and missed the rest. Forgetting the false positives (because that's a different question), ASG therefore gives us a 89.4% reduction in the spam: About 9 out of 10 spam messages are blocked.
Comment