What is a Stop Word / Ignored Word ?

POPFile keeps a list of words that it will ignore if it finds them during classification. These words are likely to be used in any sort of message, such that they are not likely to be useful in classification. Now that POPFile uses a database to store and look up words it makes little if any noticable difference in speed.

You can see and change this list on the advanced tab in the POPFile UI. These words are kept in a simple text file named 'stopwords'. To avoid using the UI to modify them simply use with your favorite text editor, and then restart POPFile to have them go into effect.

Some people don't use any stop words in hopes of a tiny accuracy improvement. It won't makes much difference either way, but if you want to let the magic of bayes take care of it all its up to you.

This link may be useful if you wish to create a stopword list for another language, just remember their stop word lists are based mostly on language rules instead of actual tests of what is common in most email:

 http://snowball.tartarus.org/
 
glossary/stopword.txt · Last modified: 2008/02/08 19:49 by 127.0.0.1

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License