# Differences

This shows you the differences between two versions of the page.

faq:howitworks [2009/04/06 11:50]
xuesheng created
faq:howitworks [2009/04/07 13:51] (current)
xuesheng Tweak the layout of the first formula
Line 3: Line 3:
POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket. POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket.
-A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines that different buckets that an email can be placed in, the probability of an individual word existing in an email for a specific bucket and the probability of an email being in a bucket to start with.+A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines the different buckets that an email can be placed in, the probability of an individual word existing in an email for a specific bucket and the probability of an email being in a bucket to start with.
Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to. Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to.
Line 9: Line 9:
We want to calculate the P(Bi|E) for each bucket Bi.  That calculation can be performed using Bayes rule as follows We want to calculate the P(Bi|E) for each bucket Bi.  That calculation can be performed using Bayes rule as follows
- P(E|Bi) x P(Bi)+   P(E|Bi) x P(Bi)
P(Bi|E) =  --------------- P(Bi|E) =  ---------------
-  P(E)+         P(E)
Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi. Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi.