- Silicon Valley's 19 Coolest Places to Work
- Is Windows 8 Development Worth the Trouble?
- 8 Books Every IT Leader Should Read This Year
- 10 Hot Hadoop Startups to Watch
Network World - Bill Yerazunis ' day job is far from boring. As senior research scientist at Mitsubishi Electric Research Laboratories, the electronics giant's North American R&D arm, Yerazunis has been involved in developing items as diverse as sensors that detect water pollution; a touch-sensitive table for small group collaboration; and a self-refilling beer mug. But for Yerazunis, the real fun has begun after-hours; he has spent the last seven years developing and tweaking CRM114 Discriminator, an open source spam filter that uses statistical probability to determine whether an e-mail is spam. CRM114 is used by individuals, corporations and some ISPs. With that success comes additional corporate responsibility. On April 1, Yerazunis officially will add spam catcher to the many roles he plays at the Cambridge, Mass., lab. But first he'll chair the fifth annual Massachusetts Institute of Technology Spam Conference, scheduled for March 30. Yerazunis recently spoke with Network World's Senior Editor Cara Garretson about his personal spam crusade.
Get a description of the CRM114 and listen to Yerazunis' podcast on the spam ecosystem
How did you get involved in fighting spam?
I was frustrated by it, so years ago I said to my manager, 'We ought to do something about spam,' and he said, 'Don't worry about it, Bill, spam will never be a problem.' I asked him if I could work on it on my own time and he said, 'I can't stop you.' He's still around. It's like the flight-instructor joke: That's one mistake he'll never make again! I was going to work on a reputation-based system that said, 'If I've gotten mail from this person before, then it's probably good; if not, then it's probably bad.' Then I said, that won't work well. So I went to a heuristics model. But those act reactively. The results you get with [the open source Apache] SpamAssassin are 90% or 95% accurate, but I wanted more - so I started doing statistical filtering.
Did spam get worse in 2006 and, if so, why?
Yes. The amount of spam has increased over time, but most filters have held up quite well. But in 2006, we started getting [at least twice as much] spam. [Through comparative filter tests,] we know the spammers aren't really evolving their techniques, they're just pumping in more spam and there are more people with bad filters. And spam has become the single driving force in the penny-stock market now. [Stock pump-and-dump spam e-mails try to convince recipients to buy shares of a certain spammer-owned penny stock. When enough recipients buy the stock, the spammer sells the stock at a profit.] There are Web pages that are 'rotisserie' stocks, where they pretend to invest $1,000 on each one that comes in. The Web page operators have lost nearly a quarter of a billion dollars at this point.
What's the outlook for 2007?
I would love to say I've got the magic elixir, but I don't. The good news is for subscribers of very large ISPs, because those ISPs get huge amounts of text to put through their filters. Other people [whether using enterprise or home e-mail] aren't going to see spam go down unless they better train the filters. Those people running without big ISPs or [good filters] are going to give up on e-mail. It's already useless without a filter. It used to be the delivery people provided assurance: Thou shall not lose an e-mail. At ARPANet [the Internet's predecessor], they made sure they could function in the face of a nuclear holocaust - that was the mindset. Now that's gone. Now you've got plausible deniability on e-mail: 'Oh, I never got it. My filter ate it.' It's greased the skids of human interaction because you can send something to someone and they can disregard it.