POPFile - Automatic Email Classification

How does POPFile evaluate new words?

POPFile assigns a probability to every word, for words that it has seen it naturally assigns the probability from the corpus. For unseen words it assigns the probability 1/(10 * size of that bucket's corpus)—i.e. a probability that indicates that the word is “unlikely” to appear.

The other possible choices are 0 (which would screw up classification since all classifications would be 0) or 1 (which would be a mistake since it would indicate that the word always appears).