INTRODUCTION v0.20.0 is a major update to POPFile with the focus being on performance. In addition POPFile makes another leap forward in support for non-English languages with many new UI translations (including our first right to left language) and full support for parsing of Japanese and Korean. v0.20.0 is intended to end-of-life v0.18.x and v0.19.x. All future development work will be occurring around the v0.20.0 line of code and no bugs will be fixed against previous versions. A lot of work has gone into making v0.20.0 the version of POPFile to have. To improve POPFile's performance the following changes have been made: 1. The corpus (where the word lists are stored) has been changed from a flat text file to a BerkeleyDB database. When you run POPFile v0.20.0 for the first time you will see your existing corpus get automatically upgraded to the BerkeleyDB. (See the 'license' file for details of BerkeleyDB's license) The use of the database both speeds up POPFile (especially the reclassification process which has slowed down in v0.19.x) and reduces its memory requirements. The time to load the corpus has now gone from minutes/seconds to close to zero. 2. The history data is cached between sessions. If you regularly start and stop POPFile (e.g. you start and stop your PC every day) then you'll notice another load-time speedup: the history data is being cached between POPFile sessions to make loading faster. 3. The history is progressively updated. As messages are downloaded from a server POPFile used to store all the messages for insertion into the history the next time the history was viewed. Now as messages are downloaded they are inserted in to the history progressively. 4. When viewing a colored message you'll notice a big speedup because there were previously two scans through the message (one for classification and one for coloring) this is now reduced to one. 5. When downloading messages we previously saved the message to disk and then reloaded it for classification. Now the mail parser has been modified so that the text of the message is streamed into it as it is read from the mail server and classification happens in line without the need to reread from disk. 6. On Windows the default configuration for the proxy is to no longer fork() the server. This means that downloading mail starts very quickly, but has the downside that only one email account can be checked at a time and the UI cannot be used during download. This new option is controlled by a configuration parameter (-pop3_force_fork) and through the UI. On non-Windows platforms POPFile will fork() each new connection. To improve POPFile's stability: There's been a huge effort to write a complete test suite for POPFile. Currently we have tests that cover 99% of POPFile's code (i.e. almost every line of code is exercised by a test) and the plan going forward is to try to keep it that way. The test suite exercises the UI as if it were a user clicking buttons and submitting forms, it includes a complete POP3 server and client so that the proxy functionality can be tested and contains hundreds of tests for mail parsing. Every module has an equivalent TestFOO.tst that tests it, if you are interested in running the test suite get the tests/ directory from CVS and run 'gmake test'. To improve POPFile's accuracy: A number of bugs have been fixed that sometimes caused POPFile internally to get the right classification and then insert the wrong headers. The mail parser has been updated with the latest spammers' tricks and new pseudowords and we've done an experiement with 'unsure' classifications and decided to ship with code that will mark a message as 'unclassified' if it isn't 100 times more certain it's in bucket A than bucket B. This should reduce the false positive rate a little at the expense of POPFile saying it's not sure. ESSENTIAL READING IF YOU ARE UPGRADING TO v0.20.0 1. BACK UP YOUR OLD INSTALLATION: POPFile makes this really easy, just copy the entire POPFile directory somewhere. You can then safely install POPFile v0.20.0 on top of your current installation; I just think a back up is a sensible precaution. 2. IF YOU ARE RUNNING WINDOWS: Please read the section below I AM RUNNING WINDOWS AND NEED TO CHECK MULTIPLE EMAIL ACCOUNTS SIMULTANEOUSLY 3. ON WINDOWS POPFILE IS NOW AN EXE. Windows users will now be able to see POPFile running in the Task Manager with an executable called popfileXX.exe where the XX is one of f, if, b, ib depending on configuration. POPFile is started by running popfile.exe which chooses the appropriate popfileXX.exe This might cause your firewall to ask about giving popfileXX.exe permissions, in addition if you had allowed Perl permissions in your firewall they are NO LONGER needed. 4. The installer will cause POPFile to run in the foreground if the database upgrade is required so that the upgrade process is evident to the user. Once upgraded you can switch to background my going to the Configuration tab and changing "Run POPFile in a console window?" to No. I AM RUNNING WINDOWS AND NEED TO CHECK MULTIPLE EMAIL ACCOUNTS SIMULTANEOUSLY Because the time taken to start a new process on Windows is long under Perl there is an optimization for Windows that is present by default: when a new connection is made between your email program and POPFile, POPFile handles it in the 'parent' process. This means that the connect happens fast and mail starts downloading very quickly, but is means that you can only downloaded messages from one server at a time (up to 6 other connections will be queued up and dealt with in the order they arrive) and the UI is unavaiable while downloading email. You can turn this behavior off (and get simultaneous UI/email access and as many email connections as you like) by going to the Configuration panel in the UI and making sure that "Allow concurrent POP3 connections:" is set to Yes, or by specifying -pop3_force_fork 1 on the command line. I AM USING THE CROSS PLATFORM VERSION POPFile requires a number of Perl modules that are available from CPAN. New in v0.20.0 are the need for the following: BerkeleyDB Text::Kakasi (if you want Japanese language support) Encode (if you want Japanese language support) I LIKE TO LIVE DANGEROUSLY In a future version POPFile will add official support for message classification through the SMTP and NNTP (Usenet news) protocols. There are currently proxy modules for these protocols that work with v0.20.0, but they have not been fully tested. If you are interested in getting them get them here: http://cvs.sourceforge.net/viewcvs.py/*checkout*/popfile/engine/Proxy/SMTP.pm?rev=1.22 http://cvs.sourceforge.net/viewcvs.py/*checkout*/popfile/engine/Proxy/NNTP.pm?rev=1.21 and place them in POPFile's Proxy/ directory. DOWNLOADING You can obtain the latest releases of POPFile by visiting http://sourceforge.net/project/showfiles.php?group_id=63137 UPGRADING Just install POPFile on top of the currently installed version. But did you read the ESSENTIAL READING above first. INTERNATIONALIZATION POPFile's support for non-English languages has grown and the UI is now localized into 26 languages: Bulgarian Chinese (simplified) Chinese (traditional) NEW Czech Danish Dutch English English (UK) Finnish French German NEW Greek NEW Hebrew Hungarian NEW Italian NEW Japanese Korean Norwegian NEW Polish NEW Portugese (Iberian) Portugese (Brazilian) Russian Slovak Spanish Swedish Ukrainian Also added support for understanding Japanese and Korean and doing word splitting correctly. DONATIONS Thank you to everyone who has clicked the Donate! button and donated their hard earned cash to me in support of POPFile. Thank you also to the people who have contributed patches, feature requests, bug reports and translations. http://sourceforge.net/forum/forum.php?forum_id=213876 CONCLUSION Keep the ideas and bug reports coming. If you are interested in knowing more about what's planned for future POPFile versions (or just learning about POPFile's history) visit the POPFile Roadmap: http://sourceforge.net/docman/display_doc.php?docid=17906&group_id=63137 John.
Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.