Skip Links

Vendors map the DNA of spam

DNA pattern detection applied to spam

By Michael Osterman, Network World
September 27, 2004 12:09 PM ET
Michael Osterman
  • Print

One of the keys to genetic research is to analyze DNA and other substances for patterns. Understanding when and where these patterns occur is important for determining genetic sequences that have important implications. Because spam messages also contain recurring patterns, the same basic genetic techniques can also be successfully applied to filter spam from e-mail.

There are several companies that are using techniques based on pattern recurrence to successfully detect spam. IBM, for example, has developed a technique using the Chung Kwei algorithm that has been demonstrated to capture a high percentage of spam. In a test of the algorithm, 96.6% of spam was correctly identified. Cloudmark uses what it calls “e-mail genetic mapping,” a somewhat different technique that is based on end user and administrator feedback to “learn” what constitutes spam for individuals and organizations as a whole. Cloudmark claims that its technique has the potential for capturing 100% of spam while generating no false positives. Another technique is employed by Commtouch with its Recurrent Pattern Detection technology that looks for patterns in spam outbreaks in real time. Independent tests of Commtouch’s RPD technology found that it captures about 97% of spam while generating almost no false positives.

Looking for patterns in e-mail as a means of detecting spam is important for a couple of reasons. First, it simply adds another method to more traditional methods of detecting spam, potentially improving the overall effectiveness of a spam-blocking tool that incorporates multiple detection techniques. Second, and more importantly, pattern detection may make it more difficult for spammers to circumvent spam-blocking systems because patterns are inherent in spam and are more difficult to overcome. In short, it’s difficult for spammers to create their stuff without recognizable patterns emerging.

Read more about software in Network World's Software section.

  • Print

Videos

rssRss Feed