POPFile Statistics - February 5, 2003 

Background

This page gives information about statistics gathered from live POPFile users between January 1, 2003 and February 5, 2003. POPFile users can opt into a scheme where POPFile sends three pieces of information back to my web server on a daily basis: the number of buckets in use, the number of messages classified and the number of errors made.

I have taken this data and done a number of things to it:

Some overall statistics:

How accurate is POPFile?

The most obvious question is how accurate is POPFile. This chart shows accuracy data broken down into 100%, 99%, 98%, ..., 90%, and below 90%.

So summarizing the information:

(If you compare those numbers with the previous survey you will see that they are almost static).

So almost everyone is getting good accuracy, almost two thirds of users are getting excellent accuracy but only 18% are getting 99% or better. I take this score card as a mixture of excellent (look how many people get great accuracy from such a simple to use program), and disappointing (I wish everyone were getting 99%), and inaccurate (hard to tell accuracy when I don't know how many of those mail errors were at the start when POPFile know nothing about your configuration).

What is POPFile used for?

The next question is what are people doing with POPFile. All I can glean from the statistics is the number of buckets they have set up. I've arranged this into 2, 3, 4, ..., 10 and more than 10.

So 40% of people have two buckets set up (and I think it's safe to assume that most of these are spam and not spam), so POPFile is being used 40% of the time for spam fighting (This compares with 39% of the time in the latest survey). The other 60% are using POPFile with a surprising number of buckets: 8% have more than 10 buckets; clearly POPFile is being used for spam fighting and a whole lot more. (Some stories about POPFile use can be found here).

Does number of buckets affect accuracy?

The interesting question here is whether POPFile's accuracy deteriorates or improves with the number of buckets. Using the same break down of bucket count as the previous section (2, 3, 4, ..., 10 and more than 10) and calculating the average accuracy for each of those uses we see:

So there seems to be a very small amount of deterioration as the bucket sizes increase a lot. Overall POPFile seems to hold up well with a large number of buckets.

Do people add more buckets over time?

Another thing I was interested in was whether people tend to add buckets over time. The data seems to bear this out:

The number of buckets in use has trended up very slightly (the black line is a linear trend line) over the days of the survey.

Thanks to all who participate in the survey by sending statistics.

Web Links:  Home Page  Forums  Bug Database  Download