Open Discussion → Using Popfile best practices.

Using Popfile best practices.

Hi Guys.

Knowing that popfile has reached a point in its learning curve that says its effectiveness is reaching an acceptable level requires continual scrutinization of the classified mails.

A problem of course for all software of this nature but while popfile is learning, not using any software and manually weeding out the spam would probably be just as quick. Maybe however it will all be worth it. I am still hopefull.


However a current issue I have is that I have allowed a missclassification to slip through and dont know how to undo any harm it may cause.

My method:
Every couple of days filter against each bucket and check that classifications are correct then remove the entries. But, today, with 14 pages of spam close scrutiny after several weeks of learning is becoming a chore. So I do a quick visual scan in outlook for which I have setup folders mirroring the buckets afterwards.

Today I noticed that a good mail had been reclassified as spam and I had already removed it from the list in popfile.

Obviously this suggests I need to make my approach a little more structured and having made the mistake I dont see how to correct it. An archive without reclassification would be usefull so that I could later reclassify errors. If the only solution is delete the database and start reclassification all over again, even if only for one bucket - its good bye popfile.

But I am wondering how people really use it or are they less concerned with me about errors of this nature continually assigning mail to spam etc.

Any comments about how your own methodologies would be appreciated.

  • Message #1659

    Today I noticed that a good mail had been reclassified as spam and I had already removed it from the list in popfile ... An archive without reclassification would be usefull so that I could later reclassify errors.

    POPFile will automatically delete entries from its HISTORY pages after a few days. The default setting is to delete old messages after two days but you can use the CONFIGURATION page to change this delay to anything between 1 and 366 days.

    If you only check for classification errors once every two weeks then I suggest you make sure POPFile retains messages for at least two weeks to give you the chance to correct the errors.

    There is no need to reclassify every message that POPFile has wrongly classified. If multiple identical or nearly-identical messages are received it is generally only necessary to train on one of them. This will not change the current classification of the other similar messages because they've already been passed along to the mail client, but will affect future messages. You can see if a message would now be classified differently on the message's Detailed View.

    The "How long will it take until POPFile will reach a decent accuracy?" page in the wiki contains some statistics showing how few messages I needed to reclassify in order to achieve 99% accuracy. I do not think having to classify 36 messages out of a total of 4,000 was a time consuming task.

    How many buckets have you defined? You need to create at least two buckets in addition to the 'unclassified' bucket and you also need to classify some mail for each bucket you have defined, otherwise POPFile will be unable to perform well.

    I make a backup copy of POPFile's SQLite database at least once a day so I can easily revert to a recent backup if disaster strikes.