How long will it take until POPFile will reach a decent accuracy?

It all depends on the number of messages you get.

Most people are reporting a very good accuracy ( > 97% ) after some 1000 messages. The global statistics (see link below), tell us that after 500 messages the average accuracy is at 98.29% and that over 92% of users are getting an accuracy over 95% at that point (as of 28 September 2009; check the link below for the latest data).

This does not mean that you will have to reclassify a thousand times before you reach that point. You will see that the majority of those 1000 messages will be put into the correct bucket. But you should be prepared to see many errors when your corpus is still fresh.1)

POPFile offers users an opportunity to help improve over time by sharing statistics anonymously. To turn this feature on, go to http://127.0.0.1:8080/security

See also:

1) Some real statistics, starting from scratch: In the first 1,000 messages processed only 21 messages had to be reclassified, giving an accuracy of 97.9%. In the next 1,000 messages received only 7 messages had to be reclassified, and in the next 2,000 messages only 8 messages had to be reclassified.

Messages Reclassifications Total Number Total Number of Accuracy
of Messages Reclassifications
1 to 1,000 21 1,000 21 97.9%
1,001 to 2,000 7 2,000 28 98.6%
2,001 to 4,000 8 4,000 36 99.1%
In other words, out of 4,000 messages only 36 messages had to be reclassified which gives an overall accuracy of 99.1% [Of course your mileage may vary.]
 
faq/whengood.txt · Last modified: 2012/08/28 13:21 by xuesheng

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License