Differences

This shows you the differences between two versions of the page.

faq:whengood [2009/09/29 11:53]
xuesheng Better to put the example statistics in a footnote?
faq:whengood [2012/08/28 13:21] (current)
xuesheng Add a table to the footnote showing how quickly POPFile can learnquired
Line 5: Line 5:
Most people are reporting a very good accuracy ( > 97% ) after some 1000 messages. The global statistics (see link below), tell us that after 500 messages the average accuracy is at 98.29% and that over 92% of users are getting an accuracy over 95% at that point (as of 28 September 2009; check the link below for the latest data). Most people are reporting a very good accuracy ( > 97% ) after some 1000 messages. The global statistics (see link below), tell us that after 500 messages the average accuracy is at 98.29% and that over 92% of users are getting an accuracy over 95% at that point (as of 28 September 2009; check the link below for the latest data).
-This does not mean that you will have to reclassify a thousand times before you reach that point. You will see that the majority of those 1000 messages will be put into the correct bucket. But you should be prepared to see many errors when your corpus is still fresh.((Some real statistics, starting from scratch: In the first 1,000 messages processed only 21 messages had to be reclassified, giving an accuracy of 97.9%. In the next 1,000 messages received only 7 messages had to be reclassified, and in the next 2,000 messages only 8 messages had to be reclassified. In other words, out of 4,000 messages only 36 messages had to be reclassified which gives an overall accuracy of 99.1% Of course your mileage may vary.))+This does not mean that you will have to reclassify a thousand times before you reach that point. You will see that the majority of those 1000 messages will be put into the correct bucket. But you should be prepared to see many errors when your corpus is still fresh.((Some real statistics, starting from scratch: In the first 1,000 messages processed only 21 messages had to be reclassified, giving an accuracy of 97.9%. In the next 1,000 messages received only 7 messages had to be reclassified, and in the next 2,000 messages only 8 messages had to be reclassified. 
 +\\ 
 +\\ 
 +^  Messages  ^ Reclassifications ^ Total Number ^ Total Number of   ^ Accuracy ^ 
 +^  :::       ^ :::               ^ of Messages  ^ Reclassifications ^ :::      ^ 
 +|      1 to 1,000  |  21  |  1,000  |  21  |  97.9%  | 
 +|  1,001 to 2,000  |   7  |  2,000  |  28  |  98.6%  | 
 +|  2,001 to 4,000  |   8  |  4,000  |  36  |  99.1%  | 
 + 
 + 
 +In other words, out of 4,000 messages only 36 messages had to be reclassified which gives an overall accuracy of 99.1% [Of course your mileage may vary.]))
POPFile offers users an opportunity to help improve over time by sharing statistics anonymously. To turn this feature on, go to http://127.0.0.1:8080/security POPFile offers users an opportunity to help improve over time by sharing statistics anonymously. To turn this feature on, go to http://127.0.0.1:8080/security
 
faq/whengood.1254217989.txt.gz · Last modified: 2009/09/29 11:53 by xuesheng

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License