Differences

This shows you the differences between two versions of the page.

--- faq:howitworks [2009/04/06 09:50] – created xuesheng
+++ faq:howitworks [2009/04/07 13:51] (current) – external edit 127.0.0.1
@@ Line 3: / Line 3: @@
 POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket.
-A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines that different buckets that an email can be placed in, the probability of an individual word existing in an email for a specific bucket and the probability of an email being in a bucket to start with.
+A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines the different buckets that an email can be placed in, the probability of an individual word existing in an email for a specific bucket and the probability of an email being in a bucket to start with.
 Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to.
@@ Line 9: / Line 9: @@
 We want to calculate the P(Bi|E) for each bucket Bi.  That calculation can be performed using Bayes rule as follows
-			P(E|Bi) x P(Bi)
+			   P(E|Bi) x P(Bi)
 		P(Bi|E) =  ---------------
-			  P(E)
+			         P(E)
 Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi.

POPFile - Automatic Email Classification

How POPFile does email classification

Differences