Skip Links

Network World

  • Social Web 
  • Email 
  • Close

Math to fight spam

Gearhead By Mark Gibbs , Network World , 09/22/2003
Gibbs
  • Share/Email
  • Comment
  • Print

The Rev. Thomas Bayes (1702-1761) is best known for his paper published posthumously in the Philosophical Transactions of the Royal Society of London in 1763 titled "Essay Towards Solving a Problem in the Doctrine of Chances".

Lest you think that you've walked into History 101 let us assure you that we are merely keeping our word. Last week we promised to elucidate Bayesian filtering, a technique used for getting rid of spam and Bayes was the discoverer of Bayes Theorem upon which Bayesian filtering is based.

Bayes Theorem is a way of calculating the probability that an event will occur based on the number of times that event has occurred in previous trials. The theorem states that for events X and Y, the probability of X given that Y has happened (denoted by p{ X | Y } ) equals the probability of Y given that X ( p{ Y | X } ) has happened times the probability of X happening ( p{ X } ) divided by the probability of Y happening ( p{ Y } ). To put that another way:

p{ X | Y } = p{ X } * p{ Y | X }

p{ Y }
Click to see:

Or, more generally,

p{ Xi | Y } = p{ Xi } * p{ Y | Xi }

(p{ X1 } * p{ Y | X1 } ) + ... + ( p{ Xi } * p{ Y | Xi } ) + ... + ( p{ Xn } * p{ Y | Xn })
Click to see:

Clear? No? OK, let's apply this to the IT world. Let's say we maintain a software package with three configuration options - option A is used by 40% of our users, option B by 30% and option C by 30% (users can only use one option at a time).

If we assume that each option raises the same percentage of support requests (say, 1% of the number of users of that option) then we would obviously want to focus our effort in improving software quality according to which option has the greatest number of users, which would mean that option A is our focus after which we could start to polish either B or C.

But the percentages of support requests are a guess at this point. As we accumulate experience supporting this product we find out that 0.5% of A users have problems, 0.75% of B users and 0.95% of C users. Now where should we apply our efforts?

Let's find what the probability of a problem being caused by Option A (denoted by

p{ A | problem }) actually is.

According to Bayes:

p{ A | problem } = p{ A } * p{ problem | A }

(p{A}*p{ problem | A })+(p{B}*p{problem|B})+(p{C}*p{problem|C})
Click to see:

Here, p{ A } equals 40%; p{ B } equals 30%; and p{ C } equals 30%. From our support experience we know that p{ problem | A } equals 0.5%, while p{ problem | B } equals 0.75% and p{ problem | C } equals 0.95%, so we get:

  • Share/Email
  • Comment
  • Print
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed