Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
utilityscripts:insert [2007/05/07 17:47] texasfettutilityscripts:insert [2008/02/08 19:49] (current) – external edit 127.0.0.1
Line 5: Line 5:
 **About Sample Size** **About Sample Size**
  
-If you use this script to train POPFile via email samples, be careful about sample size. We do **not** recommend you submit thousands of emails to the script, you will end up with a huge corpus that offers little additional benefit to classification accuracy. Your best approach would be to stick to small representative samples of at most 100 emails per bucket.+If you use this script to train POPFile via email samples, be careful about sample size.  This is not a recommended way to train POPFile, it is a utility designed for testing.  If you intend to use it to train POPFile, we do **not** recommend you submit thousands of emails to the script, you will end up with a huge corpus that offers little additional benefit to classification accuracy.  Your best approach when using this script would be to stick to small representative samples of at most 100 emails per bucket.  POPFile learns quickly so using this script is unnecessary and will result in less accuracy in classifying your mail than the recommended [[glossary:toe|TOE]] method.  You may want to look into [[http://popfile.jciv.com/xmltraintest.html|TrainTest.py]] which can simulate TOE.
  
 ===== Usage ===== ===== Usage =====
  
-**Shutdown POPFile Before Using** Shutdown any running instance of POPFile before you use this script. insert.pl modifies the corpus by adding words to it, it should not be run concurrently with POPFile to avoid damage to the corpus databases.+**Shutdown POPFile Before Using** 
 +Shutdown any running instance of POPFile before you use this script. insert.pl modifies the corpus by adding words to it, it should not be run concurrently with POPFile to avoid damage to the corpus databases.
  
-The script must be run from the popfile installation directory. Windows users should open a DOS box and switch to the popfile directory (normally c:\program files\popfile\ but it can be different on your system).+The script must be run from the POPFile installation directory. Windows users should open a DOS box and switch to the popfile directory (normally c:\program files\popfile\ but it can be different on your system).
  
 <code> <code>
Line 20: Line 21:
  
 **Feeding a directory of messages** **Feeding a directory of messages**
 +
 <code> <code>
    perl insert.pl bucketname \path\to\messages\*.*    perl insert.pl bucketname \path\to\messages\*.*
 </code> </code>
-   **Feeding a single message**+ 
 +**Feeding a single message**
 <code> <code>
    perl insert.pl bucketname messagefilename    perl insert.pl bucketname messagefilename
Line 43: Line 46:
 <code> <code>
    perl insert.pl bucketname \poptemp\*.eml    perl insert.pl bucketname \poptemp\*.eml
-</code></code>+</code>
  
  
 
utilityscripts/insert.1178560027.txt.gz · Last modified: 2008/02/08 19:49 (external edit)
Old revisions

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License