Differences

This shows you the differences between two versions of the page.

--- faq:bayesandlogs [2009/04/06 09:52] – created xuesheng
+++ faq:bayesandlogs [2009/04/07 13:54] (current) – external edit 127.0.0.1
@@ Line 3: / Line 3: @@
 POPFile makes use of Bayes Theorem in the form:
-			  P(E|Bi) x P(Bi)
+			   P(E|Bi) x P(Bi)
 		P(Bi|E) =  ---------------
 			        P(E)
@@ Line 27: / Line 27: @@
 POPFile ignores the term P(E) since it is the same for each calculation and calculates:
-(1)		P(Bi|E) = P(E1|Bi) x P(E2|Bi) x ... x P(Eo|Bi) x P(Bi)
+  (1)		P(Bi|E) = P(E1|Bi) x P(E2|Bi) x ... x P(Eo|Bi) x P(Bi)
 And picks the largest to determine which bucket the email belongs in.  But equation (1) works well in theory and badly in practice.  That chain of multiplications quickly leads to underflow in many languages because the numbers involved are tiny fractions and multiplying many together results in smaller and smaller numbers until the floating point system cannot cope any more.
@@ Line 37: / Line 37: @@
 POPFile calculates the following:
-(2)		log P(Bi|E) = log P(E1|Bi) + log P(E2|Bi) + ... + P(Eo|Bi) + log P(Bi)
+  (2)		log P(Bi|E) = log P(E1|Bi) + log P(E2|Bi) + ... + P(Eo|Bi) + log P(Bi)
 and then chooses the largest value of log P(Bi|E) in determining the bucket.  By precalculating the log values for each word and bucket and by the fact that the multiplication has been replaced by addition the calculation is more accurate and faster.

POPFile - Automatic Email Classification