What are false negatives and false positives and why are they so important?

The statistics about an email filter's effectiveness are built upon the concept of false positives and false negatives. Since POPFile is not a spam filter, but a general email classifier, the meaning of those terms differs in POPFile's context from their meaning in the context of spam filters.

A normal spam filter that categorizes mail into a spam and a ham category produces a false positive when it puts a ham message into the spam category and a false negative when it places spam into the ham category.

But, as you know, POPFile can deal with multiple categories (buckets). It is an email classifier that happens to be very good at filtering spam. But this also means that the definition of a false positive and of a false negative cannot be built around the simple question of spam versus ham. Instead, these terms are defined from the point of view of each single bucket. Suppose POPFile categorizes a message as belonging to bucket A, while it really belongs to bucket B. From the point of view of bucket A, we have a false positive because the mail was falsely tagged as belonging to bucket A. From the point of view of bucket B, we have a false negative.

Thus, in the POPFile world, a false positive/negative is always related to any of your buckets. In the world of spam filters, these terms are always used in relation to the spam category.

False positives and false negatives become vitally important only in relation to spam filtering. A false negative (spam that ends up in your inbox) is annoying. A false positive may be an important message that ends up in your spam folder or, much worse, gets deleted.

Training POPFile on every error it makes will eventually reduce the numbers of both false negatives and positives. But you should keep in mind that an error will always be possible. Thus you should not simply delete anything that POPFile tags as spam. Move it to a dedicated folder in you mail client and check that folder regularly just to be sure that is doesn't contain something important.

POPFile - Automatic Email Classification

What are false negatives and false positives and why are they so important?

What are false negatives and false positives and why are they so important?