Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision |
utilityscripts:insert [2007/05/07 17:55] – texasfett | utilityscripts:insert [2008/02/08 19:49] (current) – external edit 127.0.0.1 |
---|
**About Sample Size** | **About Sample Size** |
| |
If you use this script to train POPFile via email samples, be careful about sample size. This is not a recommended way to train POPFile, it is a utility designed for testing. If you intend to use it to train POPFile, we do **not** recommend you submit thousands of emails to the script, you will end up with a huge corpus that offers little additional benefit to classification accuracy. Your best approach when using this script would be to stick to small representative samples of at most 100 emails per bucket. POPFile learns quickly so using this script is unnecessary and will result in less accuracy in classifying your mail than the recommended [[glossary:toe|TOE]] method. You may want to look into TrainTest.py which can simulate TOE. | If you use this script to train POPFile via email samples, be careful about sample size. This is not a recommended way to train POPFile, it is a utility designed for testing. If you intend to use it to train POPFile, we do **not** recommend you submit thousands of emails to the script, you will end up with a huge corpus that offers little additional benefit to classification accuracy. Your best approach when using this script would be to stick to small representative samples of at most 100 emails per bucket. POPFile learns quickly so using this script is unnecessary and will result in less accuracy in classifying your mail than the recommended [[glossary:toe|TOE]] method. You may want to look into [[http://popfile.jciv.com/xmltraintest.html|TrainTest.py]] which can simulate TOE. |
| |
===== Usage ===== | ===== Usage ===== |