This is an old revision of the document!
This applies to POPFile 0.20.x only. It may be be helpful if you are having trouble upgrading from 0.20.x. The problems described no longer affect the current version of POPFile.
POPFile's corpus is stored in BerkeleyDB table files, one table.db file for each bucket that comprises the corpus. Under certain conditions, it is possible for a table.db file to become corrupt, e.g., system crashes or forcefully killing POPFile before critical table information was written to disk. The table.db files contain:
We believe corpus corruption can happen in two different ways:
Possible symptoms of a corrupt corpus include;
POPFile Engine v0.20.1 running Illegal division by zero at C:\Program Files\POPFile/Classifier/Bayes.pm line 37 4, <GEN3> line 642.
The utility dbverify - checking for a corrupt corpus can be used to check your corpus and identify any buckets that have corruption. If the utility does not report corruption, your corpus is ok.
Once a table.db is corrupted, your choices to rectify the situation are:
cd "\program files\popfile" perl cunload.pl
C:\program files\popfile>perl cunload.pl Checking corpus/magnet/table.db Checking corpus/normal/table.db Checking corpus/spam/table.db *ERROR** bucket corpus/spam has a corrupt corpus, db_verify returns: DB_VERIFY_BAD: Database verification failed Bucket corpus/spam is likely corrupt, word count is 10882 versus 12687 Bucket corpus/spam is likely corrupt, unique count is 3148 versus 3912
del corpus\spam\table del corpus\spam\table.db
del corpus\spam\table.db
Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.