Differences

This shows you the differences between two versions of the page.


faq:speeduptraining [2008/02/08 19:49] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +==== Speed Up Training ====
 +
 +A lot of people ask how they can make POPFile learn quicker rather than just go through the normal training process. The recommended method for training POPFile is to reclassify messages that are missclassified as they arrive.
 +
 +You should not try to speed up learning by forwarding/resending old mail to yourself.  POPFile learns a lot about email from its headers; when you forward existing messages it adds characteristics of your own mail which will confuse POPFile's training.
 +
 +In addition, spam (as well as your good mail) changes over time. It is not necessary or even a good idea to train POPFile on all your old mail. Feeding POPFile lots of old archived emails needlessly increases the size of the corpus. This may not only hurt performance, but accuracy because you are training it with outdated information that may not accurately represent your current mail. Feeding POPFile only your archived spams would be even worse, POPFile needs to learn from both good and bad mail in a balanced way.
 +
 +POPFile learns pretty quickly, if you must have an instant solution without any effort, then a [[Glossary:Bayesian | Bayesian]] filter is not right for you.  They take time to work their most efficiently.  Even the best trained Bayesian filter will make errors that require correction so it will learn because your mail and spam changes all the time.
 +
 +See also:
 +  *[[Glossary:TOE | TOE (Train Only on Errors) ]]
 +  *[[FAQ:CorpusUnbalance | Corpus Unbalance]]
  
 
Old revisions

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License