POPFile - Automatic Email Classification

Speed Up Training

A lot of people ask how they can make POPFile learn quicker rather than just go through the normal training process. The recommended method for training POPFile is to reclassify messages that are missclassified as they arrive.

You should not try to speed up learning by forwarding/resending old mail to yourself. POPFile learns a lot about email from its headers; when you forward existing messages it adds characteristics of your own mail which will confuse POPFile's training.

In addition, spam (as well as your good mail) changes over time. It is not necessary or even a good idea to train POPFile on all your old mail. Feeding POPFile lots of old archived emails needlessly increases the size of the corpus. This may not only hurt performance, but accuracy because you are training it with outdated information that may not accurately represent your current mail. Feeding POPFile only your archived spams would be even worse, POPFile needs to learn from both good and bad mail in a balanced way.

POPFile learns pretty quickly, if you must have an instant solution without any effort, then a Bayesian filter is not right for you. They take time to work their most efficiently. Even the best trained Bayesian filter will make errors that require correction so it will learn because your mail and spam changes all the time.