Skip Links

Network World

  • Social Web 
  • Email 
  • Close

(Comma separation for multiple addresses)
Your Message:

WORLDBEAT - Spam project pits humans vs. machines

By Jeremy Kirk , IDG News Service , 06/26/2006
  • Share/Email
  • Tweet This
  • Comment
  • Print

John Graham-Cumming is about 666,666 clicks away from a new weapon that could help kill spam -- that's unsolicited e-mail, not the salty canned meat -- for good.

Graham-Cumming, an Englishman who lives in Toulouse, France, is a seasoned spam fighter who wrote Popfile, an open source e-mail classification tool. He also wrote Polymail, an antispam library licensed by other companies for use in spam filters.

Spam still comprises about 80% of all e-mail, although it has become less of an annoyance due to much-improved filtering. But spammers persevere, finding technical ways of slipping e-mail through, and the race continues to develop sharper filters.

"I don't think spam is going to go away," Graham-Cumming said. "Clearly spammers are still making money or they wouldn't be sending lots of spam."

Graham-Cumming's new project asks people to donate their time to classify a "corpus" of 100,000 e-mail messages used to test the accuracy of spam filters. He's set up a site, www.spamorham.org, where people can randomly sort messages as either spam or ham, which is good e-mail.

The e-mail messages comprise the Text Retrieval Conference 2005 Public Spam Corpus, affiliated with the U.S. National Institute of Standards and Technology.

An unlikely major donor of the e-mail was Enron, the U.S. energy company whose errant accounting practices led to bankruptcy in 2001. The e-mail of dozens of Enron employees was subpoenaed and eventually released to the public.

The Enron e-mail messages are a hot commodity for spam research -- a rich trove of private e-mail and spam that's hard to come by, Graham-Cumming said.

The idea is for each e-mail to be classified 10 times for a majority consensus. So far, the project is about one-third done.

I stepped up to the challenge. I started classifying e-mail, hoping to run across Enron employee gossip about what happened at the last company party, such as stories of accountants wearing lamp shades on their heads (which appears to have continued well into their working day).

I buzzed through 25 e-mail messages, most of which were obviously spam and devoid of scuttlebutt. Unfortunately, the real messages I came across were strictly numbing work chatter, which made the seedy spam subject lines at least mildly amusing by comparison.

I disagreed with the machines on one message, which was classified as real by the filters. The message was composed of complete sentences that appeared to be from news stories but in utter non sequiturs. The e-mail also lacked a bull's eye zinger such as +V1a*gra! 2nite!

The message was obviously junk, but didn't make any sense, somehow wriggling through the spam filter's clutches.

Most messages are easy to classify to anyone vaguely familiar with e-mail. But overall, machines and people disagree about one out of 10 times, Graham-Cumming said.

Not surprisingly, phishing e-mail messages, which often look quite legitimate but dupe people into divulging personal details, are hardest for people to distinguish, Graham-Cumming said.

  • Share/Email
  • Tweet This
  • Comment
  • Print

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed