Open Discussion → Method of passing .eml files through POPFiles for classification?
I've been working on training my POPFile filter these past few days, and what I've done is to load all my old mails onto a free IMAP server (like Gmail) and have a mail client slowly retrieve all of them off the server via POP3.
This works, except... it is not very efficient. Is there a way where I can, for instance, import 1000 eml files, check the classifications for errors and retrain as necessary, then run another 1000 files through as necessary?
Sorry for my english :)
It is possible to feed POPFile with lots of old mail messages but it is not a recommended method to train POPFile.
POPFile is a very quick learner so you may be surprised at how few messages need to be reclassified before POPFile achieves accurate results.
I decided to see how quick POPFile could learn so I deleted all my POPFile data and started from scratch with seven empty buckets (spam plus six others for good mail). In the first 1,000 messages processed only 21 messages had to be reclassified, giving an accuracy of 97.9%. In the next 1,000 messages received only 7 messages had to be reclassified, and in the next 2,000 messages only 8 messages had to be reclassified. In other words, out of 4,000 messages only 36 messages had to be reclassified which gives an overall accuracy of 99.1% Of course your mileage may vary.
Our wiki has several pages about training:
There is a utility script which you can try if you still want to feed lots of messages into POPFile. It is several years since I last used the insert.pl utility script so I don't know if it still works properly. Please read the "About Sample Size" section of the script's wiki page before trying to use the script.