Adding the MeCab parser to POPFile 1.0.0 or later

Japanese (Nihongo) text does not use spaces between words so POPFile uses a 'Nihongo Parser' to split the text into words to allow the text to be analysed properly. The 1.0.0 (or later) release offers a choice of three parsers (Kakasi, MeCab and internal). Prior releases of POPFile only supported the Kakasi parser.

The Windows installer lets the user select the Nihongo parser to be installed. All of the files needed for the Kakasi and internal parsers are included in the installer. The MeCab package size is about 13 MB (i.e. too big to include in the installer) so it is downloaded from the POPFile web-server if the MeCab parser is selected when installing POPFile.

MeCab Files on the POPFile Server (getpopfile.org)

The /var/www/installer/nihongo/mecab directory contains, at the time of writing (10 March 2008), the following files which are used when adding MeCab support to the Windows version of POPFile:

MeCab.ppd
MeCab.tar.gz
mecab-ipadic.zip
mecab.md5
MD5SUMS

The MeCab.ppd, MeCab.tar.gz and mecab-ipadic.zip files are maintained by Naoki and contain the PPD (Perl Package Description) for the MeCab module and all of the files needed to add the MeCab parser to POPFile.

The mecab.md5 file contains the MD5 sum of every file contained in the mecab-ipadic.zip file. The zip file includes several subdirectories so mecab.md5 lists pathnames as well as filenames.

The MD5SUMS file contains the MD5 sums for the other files in the directory. It is used by the installer to check the integrity of the files it downloads. This file can be generated and checked quite easily at the web server's command-line:

$ rm MD5SUMS
$ md5sum * > MD5SUMS
$ md5sum -c MD5SUMS

mecab-ipadic.zip is a very large file (over 12MB in size). If MeCab is selected as the Nihongo parser when installing (or upgrading) POPFile or if it is selected as the Nihongo parser via POPFile's “Add/Remove Programs” entry then the installer will check to see if an up-to-date version of the MeCab parser is already installed.

This check is performed by downloading the mecab.md5 file and comparing the MD5 checksums of every file listed in it. If any differences are found then the MeCab.tar.gz and mecab-ipadic.zip files will be downloaded and unpacked.

 
devel/mecabsupport.txt · Last modified: 2008/03/10 12:46 by 127.0.0.1

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License