Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
faq:howitworks [2009/04/06 09:50] – created xueshengfaq:howitworks [2009/04/07 13:51] (current) – external edit 127.0.0.1
Line 3: Line 3:
 POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket. POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket.
  
-A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines that different buckets that an email can be placed in, the probability of an individual word existing in an email for a specific bucket and the probability of an email being in a bucket to start with.+A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines the different buckets that an email can be placed in, the probability of an individual word existing in an email for a specific bucket and the probability of an email being in a bucket to start with.
  
 Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to. Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to.
Line 9: Line 9:
 We want to calculate the P(Bi|E) for each bucket Bi.  That calculation can be performed using Bayes rule as follows We want to calculate the P(Bi|E) for each bucket Bi.  That calculation can be performed using Bayes rule as follows
  
- P(E|Bi) x P(Bi)+    P(E|Bi) x P(Bi)
  P(Bi|E) =  ---------------  P(Bi|E) =  ---------------
-   P(E)+          P(E)
  
 Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi.  Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi. 
 
faq/howitworks.1239011410.txt.gz · Last modified: 2009/04/06 11:50 (external edit)
Old revisions

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License