Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
faq:bayesandlogs [2009/04/06 09:52] – created xueshengfaq:bayesandlogs [2009/04/07 13:54] (current) – external edit 127.0.0.1
Line 3: Line 3:
 POPFile makes use of Bayes Theorem in the form: POPFile makes use of Bayes Theorem in the form:
  
-   P(E|Bi) x P(Bi)+    P(E|Bi) x P(Bi)
  P(Bi|E) =  ---------------  P(Bi|E) =  ---------------
          P(E)          P(E)
Line 27: Line 27:
 POPFile ignores the term P(E) since it is the same for each calculation and calculates: POPFile ignores the term P(E) since it is the same for each calculation and calculates:
  
-(1) P(Bi|E) = P(E1|Bi) x P(E2|Bi) x ... x P(Eo|Bi) x P(Bi)+  (1) P(Bi|E) = P(E1|Bi) x P(E2|Bi) x ... x P(Eo|Bi) x P(Bi)
  
 And picks the largest to determine which bucket the email belongs in.  But equation (1) works well in theory and badly in practice.  That chain of multiplications quickly leads to underflow in many languages because the numbers involved are tiny fractions and multiplying many together results in smaller and smaller numbers until the floating point system cannot cope any more. And picks the largest to determine which bucket the email belongs in.  But equation (1) works well in theory and badly in practice.  That chain of multiplications quickly leads to underflow in many languages because the numbers involved are tiny fractions and multiplying many together results in smaller and smaller numbers until the floating point system cannot cope any more.
Line 37: Line 37:
 POPFile calculates the following: POPFile calculates the following:
  
-(2) log P(Bi|E) = log P(E1|Bi) + log P(E2|Bi) + ... + P(Eo|Bi) + log P(Bi)+  (2) log P(Bi|E) = log P(E1|Bi) + log P(E2|Bi) + ... + P(Eo|Bi) + log P(Bi)
  
 and then chooses the largest value of log P(Bi|E) in determining the bucket.  By precalculating the log values for each word and bucket and by the fact that the multiplication has been replaced by addition the calculation is more accurate and faster. and then chooses the largest value of log P(Bi|E) in determining the bucket.  By precalculating the log values for each word and bucket and by the fact that the multiplication has been replaced by addition the calculation is more accurate and faster.
 
faq/bayesandlogs.1239011535.txt.gz · Last modified: 2009/04/06 11:52 (external edit)
Old revisions

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License