Spam and statistics - Network World

Skip Links

DNSstuff.com
Get information about your IP
IP Information
50+ On-demand DNS and network tools

Software

Videos

rssRss Feed
Get instant email notification when white papers, webcasts, executive guides are added to our library.  Stay informed and up-to-date with the latest on IT Technologies with Network World's Resource Alerts.

Additional Resources

RSS

FEATURED WHITEPAPERS

Auditing and Recovery for Active Directory: What's New in Windows Server 2008 NetPro

Windows Server 2008 is not intended to be a "one size fits all" solution and Microsoft relies on third-party solutions to enhance and extend Windows Server 2008 to accommodate functions like auditing, backup and recovery. Here, we look specifically at audit and recovery capabilities for Active Directory and learn where Windows Server 2008 toolset leaves off, and where the right third-party solution can provide broader coverage and enhanced management capabilities.

RSS

FEATURED REPORTS

Executive Guide: Storage Heats Up HP

Get the latest on storage technologies that allow IT professionals to better cope with new IT demands. Learn how storage technologies can help you successfully tackle e-Discover, regulatory compliance, green data center initiatives and the data explosion. Get all the details now.

RSS

FEATURED WEBCASTS

Get Real-world Advice on how to Cost Effectively Consolidate your Data Center Novell

Discover the benefits of paravirtualization in this informative webcast today. This server virtualization-themed webcast not only explores how to improve virtualized server performance, but provides real-world user examples, explains how to optimize workloads and discusses the future of server virtualization. Focus on only the themes that interest you or watch all six consecutively for a full picture of how you can lower your costs significantly through consolidation and virtualization. Register below to learn more and be entered to win an Archos 605 Portable Media Player.

Spam and statistics

By Joel Snyder , Network World , 09/15/2003
  • Social Web 
  • Email 
  • Feedback 
  • Close

Say false positive, and you immediately dive into a tough world - statistics of diagnostic tests. The terms false positive and false negative (and their cousins, true positive and true negative) are fairly easy to define. But turning the number of false positives and false negatives into easy-to-digest statistics is different, because the anti-spam community has not come to any agreement on which numbers to use across products.

A spam filter is a diagnostic test. For some set of thresholds, it will say "this is spam" or "this is not spam." In our testing, we didn't expose those thresholds. Instead, we asked the vendors to pick thresholds such that the false-positive rate would be kept to less than 1%. Interestingly enough, none of the vendors asked what we meant when we asked for false-positive rate. Based on your tolerance for false negatives (spam in your mailbox) or false positives (mail mismarked as spam, lost or delayed), you might want to set these thresholds differently.

Four main statistics are used to describe diagnostic tests. Positive predictive value (PPV) and negative predictive value (NPV) go together. They measure how likely the test is to be correct. PPV measures the probability that a message actually is spam, given that the test says that it is. PPV is computed by dividing the number of true positives by the sum of true positives and false positives. However, PPV doesn't say how much spam will be filtered out: The number of missed spam doesn't figure into that statistic at all.

Sensitivity and specificity are the other two statistics, sometimes called the true positive rate and true negative rate. They measure how likely a test is to catch whatever is being tested. Sensitivity, for example, measures the probability that a message will test as spam, given that it actually is spam. Sensitivity is computed by dividing the number of true positives by the sum of true positives and false negatives. Most research on diagnostic tests uses PPV and NPV or sensitivity and specificity to describe how well a test works because these are well-defined statistics.

1 | 2 | 3 |  Next >
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to moderator approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.
First Name
Last Name
E-mail
Zip Code
IT Buyer's Guides

View All Buyer's Guides