Artificial intelligence scopes out spam

In the cat-and-mouse game of the antispam industry, staying one step ahead of spammers is difficult because they constantly exploit the weaknesses of e-mail keyword filtering. But the newest artificial-intelligence filtering technology may adapt faster than the spammers can alter their messages.

Artificial intelligence techniques closely resemble the way our brains learn. Once we learn a skill, we use it to reason with. Using artificial intelligence to detect spam is done in the same way.

Natural-language processors serve as powerful artificial intelligence tools in the fight against spam. These processors, which actually are an array of complex algorithms, scan e-mail messages to discover the content of the messages. The algorithms are packaged into mail-filtering software, which generally sits outside a firewall or at an application service provider's network.

Artificial intelligence mail-filtering software accepts all in-bound e-mail traffic, routing legitimate traffic to a corporate SMTP server and flagging other messages as spam. Suspect e-mail is sent to a quarantine area where an administrator can view the contents to determine whether to discard it or pass it along.

Humans can quickly skim a message to judge if it is spam. Referencing keywords by their location in a sentence lets us understand the difference between "chicken breasts" as food and "bare breasts" as pornography. Similarly, natural-language algorithms break down messages into sentences and analyze their meaning.

With considerable processing effort, natural-language processing technology pieces together the meaning of messages by analyzing the words, sentences and paragraphs in the reverse order from which the algorithms originally took them apart.

Consider this e-mail example: "These delicious chicken breasts look good enough to eat - let's cook out tonight. If you can bring Bill, call me at work, 800-262-2222 x231. Oh, and check out the pictures from our last cookout at www.ophoto.com/2623/party_pictures>." A standard keyword analysis would flag the terms: breasts, look good, toll free number, Web site URL. But artificial intelligence analysis would determine the message was an invitation to dinner.

In the example, keyword-filtering technology picks out pieces of the sentence without really understanding its meaning. Its selective hearing incorrectly determines that the sentence is pornographic.

Another challenge is picking out legitimate business correspondence. For example, an e-mail from a mortgage broker to a client might say, "Sam, I did some digging, and I found some unbelievably low mortgage rates with no money down this morning. If you want to get one, you'll need to call me today so I can lock the rate in for you. One of the rates expires at midnight tonight." A standard keyword analysis would classify the message as spam based on the terms: Low mortgage rates, no money down, expires at midnight. However, artificial intelligence analysis would reveal that this was a correspondence regarding mortgages.

Even the sharpest artificial intelligence techniques might question the analysis of a message itself, but a final determination goes beyond the text. For example, transmission-pattern techniques look at when messages were sent, who sent them and where they originated. Say the mortgage message above came from the same address as 12 other messages sent in the past week. These all came from the same server, during normal business hours, and none looked like spam. Clearly this makes a reasonable case in defense of this message. Other filtering techniques, however, might toss the same e-mail into the trash.

While there never will be a system that stops 100% of spam, artificial intelligence techniques come closer to that goal than ever before.

Strickler, CEO of MailWise, can be reached at dstrickler@mailwise.com.

How natural language processors work

Learn more about this topic

Spam research center

The latest news, software reviews, how-tos and more.

Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies