This shows you the differences between two versions of the page.
— | faq:corpusunbalance [2008/02/08 19:49] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Is it possible to ' | ||
+ | |||
+ | While the research hasn't been fully conclusive, it does not appear that buckets will become unbalanced as long as you continue [[Glossary: | ||
+ | |||
+ | Three situations that can lead to an // | ||
+ | - Deleting a mature bucket that contained a significant portion of the overall corpus' | ||
+ | - Erasing all the words from a mature bucket can have the same impact. | ||
+ | - Adding a new bucket to a mature corpus setup that is not likely to receive a lot of messages can quickly cause standard message headers to be significantly weighted toward the new bucket after getting a few messages reclassified. | ||
+ | |||
+ | As a general rule of thumb, if you are performing a major reorganization of your buckets by deleting buckets, erasing words from them, or adding a lot of buckets, you will get the best results by simply erasing all buckets and re-starting the training. | ||
+ | |||
+ | It is also a good idea to reset your training after first using POPFile for a month or two. By that time you will probably have a clearer idea of what goes in each of your buckets and overall how POPFile works so your accuracy should go up a good bit. | ||
Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.