TOE (Train Only on Errors)

POPFile currently works best by Training Only on Errors, commonly referred to as “TOE”. When you find POPFile has misclassified a message you reclassify it, this is most likely what you are doing now.

An alternate, less popular strategy is to Train Always or TA. In this case you are reclassifying every message. If its already correct you reclassify the message to the same bucket to reinforce its training. If the classification was wrong you reclassify to the correct bucket. This method is inconvenient because you are training on each message manually.

Research has shown that TA is only slightly more accurate and sometimes even a bit less accurate. The main drawback of doing TA is that it results in a greatly increased size of the corpus with data that may slow down classification slightly as the corpus gets very large.

In addition, if multiple identical or nearly-identical messages are received it is generally only necessary to train on one of them. This will not change the current classification of the other similar messages because they've already been passed along to the mail client, but will affect future messages. You can see if a message would now be classified differently on the message's Detailed View (starting with PF 0.22).

See also:

 
glossary/toe.txt · Last modified: 2009/09/30 13:34 by 127.0.0.1

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License