Merging Buckets

There is currently no built in method or utility to merge buckets. Technically it would not be difficult to program (especially using XMLRPC), but no one can agree on the correct way to handle merging scores for words that appear in both buckets. No matter how they are merged some words may be over weighted for the merged bucket compared to the other remaining buckets and could cause missclassifications.

See the bottom half of this thread for the most recent in depth discussion of this problem.

Its not recomended, but you could try just deleting the bucket with fewest words between the ones you want to consolodate and then just train messages that would have been in the deleted bucket to the correct bucket. It will take some time to rebuild your training for those messages though and you may not be happy with the accuracy for a good while.

The best option when you are making major changes to your buckets is to reset all your buckets and start training over again. After using POPFile for a while you will likely be more accurate in your training than you were when you originally started. Some long time users do this ever once in a while just to clear out junk that gets accumulated.

For more about why that is the best option see FAQ/CorpusUnbalance.

 
faq/mergebuckets.txt · Last modified: 2008/02/08 19:49 by 127.0.0.1

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License