This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
faq:howitworks [2009/04/06 09:50] – created xuesheng | faq:howitworks [2009/04/07 13:51] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 3: | Line 3: | ||
POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket. | POPFile uses a technique called Naive Bayes to calculate the probability that the words in an email mean that that email falls into a specific bucket. | ||
- | A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines | + | A bucket is represented by a collection of words and their frequency. The set of buckets is called the corpus and determines |
Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to. | Suppose there are n buckets B1 to Bn and there are m words in total W1 to Wm. We want to know for a specific email E which bucket it is most likely to belong to. | ||
Line 9: | Line 9: | ||
We want to calculate the P(Bi|E) for each bucket Bi. That calculation can be performed using Bayes rule as follows | We want to calculate the P(Bi|E) for each bucket Bi. That calculation can be performed using Bayes rule as follows | ||
- | P(E|Bi) x P(Bi) | + | P(E|Bi) x P(Bi) |
P(Bi|E) = --------------- | P(Bi|E) = --------------- | ||
- | | + | P(E) |
Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi. | Here P(Bi|E) is the probability that email E is in bucket Bi; that is the probability that given a set of words E they appear in bucket Bi. |
Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.