dbverify - checking for a corrupt corpus

This applies to POPFile 0.20.x only. It may be be helpful if you are having trouble upgrading from 0.20.x. The problems described no longer affect the current version of POPFile.

The unsupported dbverify utility will check your corpus for corruption. It peforms the following checks;

  1. uses the db_verify calls of BerkeleyDB to check each bucket's table.db file, reports if any issues are found.
  2. checks the table.db files for internal consistency by reading the entire bucket and recalculating the total unique word count and wordcount sums and comparing them to the internally stored values, reports if they do not agree.

You can be reasonably assured that your table.db files are not corrupt if the utility fails to report any errors.

Running dbverify

  • save it to your POPFile directory
  • open a DOS command box and switch to your POPFile directory (normally c:\program files\popfile), e.g.,
    cd  "\program files\popfile"
  • run the utility, e.g.,
    perl dbverify.pl 

Note: If you have changed the default location of your corpus via the bayes_corpus parameter, you must pass the new corpus directory location to the utility on the command line (most users can ignore this note, only advanced users would have made this particular change).

Example Output for a Corrupt Corpus

In this example, the utility is shown reporting corruption in both the magnet and spam buckets for this corpus.

Checking corpus/magnet/table.db
    *ERROR** bucket corpus/magnet has a corrupt corpus,
db_verify returns: DB_VERIFY_BAD: Database verification failed
Bucket corpus/magnet is likely corrupt, word count is 6237 versus 5250
Bucket corpus/magnet is likely corrupt, unique count is 1767 versus 4308
Checking corpus/normal/table.db
Checking corpus/spam/table.db
    *ERROR** bucket corpus/spam has a corrupt corpus,
db_verify returns: DB_VERIFY_BAD: Database verification failed

Example Output for a Normal Corpus

In this example, the utility is shown reporting no instances of corruption in any of the three buckets of the corpus.

Checking corpus/magnet/table.db
Checking corpus/normal/table.db
Checking corpus/spam/table.db
 
dbverify.txt · Last modified: 2008/02/08 19:49 by 127.0.0.1

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License