How does POPFile evaluate new words?

POPFile assigns a probability to every word, for words that it has seen it naturally assigns the probability from the corpus. For unseen words it assigns the probability 1/(10 * size of that bucket's corpus)—i.e. a probability that indicates that the word is “unlikely” to appear.

The other possible choices are 0 (which would screw up classification since all classifications would be 0) or 1 (which would be a mistake since it would indicate that the word always appears).

 
faq/newwords.txt · Last modified: 2008/02/08 19:49 (external edit)

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License