This is an old revision of the document!
The insert.pl script provides a way to train your corpus by feeding it sample emails for a particular bucket. Those emails are parsed and internally reclassified to the bucket you specify.
About Sample Size
If you use this script to train POPFile via email samples, be careful about sample size. We do not recommend you submit thousands of emails to the script, you will end up with a huge corpus that offers little additional benefit to classification accuracy. Your best approach would be to stick to small representative samples of at most 100 emails per bucket.
Shutdown POPFile Before Using Shutdown any running instance of POPFile before you use this script. insert.pl modifies the corpus by adding words to it, it should not be run concurrently with POPFile to avoid damage to the corpus databases.
The script must be run from the popfile installation directory. Windows users should open a DOS box and switch to the popfile directory (normally c:\program files\popfile\ but it can be different on your system).
cd "\program files\popfile\"
Once in the popfile installation directory, issue the following to run the program.
Feeding a directory of messages
perl insert.pl bucketname \path\to\messages\*.*
perl insert.pl bucketname \path\to\eudora\newsletters.mbx
Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.