Open Discussion → Slower reclassifications

Slower reclassifications

Have been using PopFile? since 2003. I can not thank enough the developers and volunteers for this program. Life would not be the same for me without PopFile?. Experience has taught me NOT to be an early adoptee... Thus I waited 2 years to update from 0.22.5 to 1.1.0, just to make sure that I do not bring upon myself unnecessary grief. Today, I went to 1.1.0. My first impression: "reclassification" is MUCH slower now, nearly 1 minute (measured 47 seconds for a 1-line email), up from about 5-6 seconds in 0.22.5. That means that if I am unfortunate enought to need to classify/reclassify 10 emails in the morning, I will take a 10-minute hit of lost productivity. Can't afford that. Upgrade was done on same computer/OS (fast dual-core Xeon 2.8GHz, 1GB RAM, XP/SP3). popfile.db = 50,735 Kb. Does anyone know how I can go back to the same reclassification speed I had in 0.22.5 in this new version? Else, how I can revert to 0.22.5 (I made a backup copy of the entire PopFile? folder "before" update). Thanks.

  • Message #850

    update from 0.22.5 to 1.1.0 ... popfile.db = 50,735 Kb.

    Is that the size of popfile.db after the upgrade to 1.1.0?

    The 1.1.0 release uses SQLite 3.x instead of 2.x and as a result the new popfile.db should be a lot smaller than the old one. If that is the size after the upgrade, how big was the old database? (When the database was upgraded POPFile makes a backup copy of the old SQLite 2.x format one (in the same folder) and calls it popfile.db-sqlite2.)

    Did the 1.1.0 installer display a window showing the database upgrade process? How long did the upgrade take?

    Have you checked the new database by shutting down POPFile and running the "Check database status" shortcut from the Start Menu (under POPFile -- Support)?

    Does anyone know how I can go back to the same reclassification speed I had in 0.22.5 in this new version?

    I've not tested reclassification speed with such a large database but I'll try to do this and see if I can recreate the problem you're experiencing.

    I made a backup copy of the entire PopFile?? folder "before" update

    Normally the Windows installer creates separate folders for the POPFile program and your POPFile data. Did you backup both folders? When you upgraded the installer should have defaulted to updating both of these folders; you can check their locations using Start -- All Programs -- POPFile -- Support -- PFI Diagnostic utility (simple).

    Brian

    • Message #851

      Thanks so much for the quick reply.
      "Is that the size of popfile.db after the upgrade to 1.1.0?"
      Yes.
      Size before update was 102,807 Kb.
      "Did the 1.1.0 installer display a window showing the database upgrade process?"
      Yes, all worked correctly.
      "How long did the upgrade take?"
      I left the computer and did some reading. I guess approx 20 minutes.
      "Check database status"
      Have done that. Everything OK, no errors reported.
      "Did you backup both folders?"
      Before starting the upgrade I did what I used to do before all previous upgrades: copied the entire PopFile? directory (as it was in 0.22.5) and gave it a different name (now called "PopFile? Backup 3"). Just in case. That was done before starting the upgrade.
      The "PFI Diagnostic utility (simple). " shows the following:


      POPFile PFI Diagnostic Utility v0.1.12 (simple mode)


      String data report format (not used for numeric data)

      string not found : ><
      empty string found : < >
      string with 'xyz' value found : < xyz >


      Current UserName? = Administrator (Admin)

      Program folder = < C:\PopFile? >
      SFN equivalent = < C:\PopFile? >

      User Data folder = < C:\PopFile? >
      SFN equivalent = < C:\PopFile? >

      popfile.pl file = found
      popfile.cfg file = found

      HKCU: *.exe count = 6 (this is OK)


      POPFile Environment Variables


      'POPFILE_ROOT' = < C:\PopFile? >
      'POPFILE_USER' = < C:\PopFile? >

      popfile.pl file = found
      popfile.cfg file = found

      ROOT: *.exe count = 6 (this is OK)

      'ITAIJIDICTPATH' = >< (this is OK)
      'KANWADICTPATH' = >< (this is OK)

      'MECABRC' = >< (this is OK)


      (report created 20-Jun-2009 @ 14:33:54)


      Thanks,

      Cris

      • Message #852

        Thanks for the information.

        My normal POPFile database is only 2,515 KB in size but I have some larger test databases available. I tried using one of the 1.1.0 release candidates (simply because it was already installed) with a large database and found that reclassifying messages took longer than usual but nowhere near as long as you are reporting.

        My tests used a SQLite 3.x database size of 65,499 KB containing 6 buckets each with over 265,000 distinct words in it (it is a somewhat artificial database). I found that reclassifying a single message from the HISTORY page or from SINGLE MESSAGE VIEW took about 19 seconds on my AMD dual core 2.5 GHz, 4GB RAM, Vista SP2 32-bit system. This time did not seem to vary according to the message size (I reclassified 4 messages, one at a time: 939 bytes, 1.1 KB, 93.2 KB and 99.9 KB).

        We're about to release 1.1.1 which includes an upgrade to a more recent SQLite 3.x engine which is supposed to be a little faster. I'll try repeating my tests with 1.1.1 RC2 and also with the 0.22.5 release to see what differences show up.

        In the meantime, you can always revert to 0.22.5 to get back to a more usable system. Since you have stored everything (i.e the program and the user data) in a single folder this should be easy to do.

        First check that you have not disabled SFN support:
        Start -- All Programs -- POPFile -- Support -- PFI Diagnostic utility (full)

        If round about line 70 the report says:

        ------------------------------------------------------------
        POPFile Registry Data
        ------------------------------------------------------------
        
        NTFS SFN Disabled = < 0 >
        

        then SFN support is still enabled and all you need to do is run the adduser.exe program in the "POPFile Backup 3" folder and at the "Folder to be used to store the POPFile data for '<username>' prompt set the path to "C:\POPFile Backup 3".

        When you click "Next" you should get a message asking if it is OK to upgrade the configuration data found at "C:\POPFile Backup 3". Click "Yes" and then "Upgrade". This will reset POPFile's Start Menu shortcuts and registry settings to use the version stored in "C:\POPFile Backup 3".

        To be on the safe side DO NOT let adduser.exe start POPFile - select the "No" radio button and then once the wizard has finished run the "PFI Diagnostic utility (simple)" shortcut and check that the "Program folder", "User Data folder", "POPFILE_ROOT" and "POPFILE_USER" entries all point to "C:\POPFile Backup 3". If so then things should be back the way they were before you tried the 1.1.0 release (apart from the name of the POPFile 0.22.5 folder).

        Of course if you want to be really safe you could make another copy of your backup folder (C:\POPFile-previous, say) and use that or copy the POPFile backup folder to another disk or burn a CD before you try following my instructions, just in case :)

        Brian

        • Message #853

          We're about to release 1.1.1 which includes an upgrade to a more recent SQLite 3.x engine which is supposed to be a little faster. I'll try repeating my tests with 1.1.1 RC2 and also with the 0.22.5 release to see what differences show up.

          I'm seeing a similar slowdown from 0.22.5 to the current 1.1.1 SVN code (what took 9 seconds now takes 30 seconds) so I have raised ticket:108 about this problem.

          Sadly I know nothing about the SQLite side of things so we'll have to wait for Naoki or Manni to investigate further.

          Brian

          • Message #854

            Thanks Brian so much for your reply. I am aware that my database is large, I have many buckets, etc. Even so, performance in 0.22.5 is very reasonable (accuracy is stellar) and I'll probably revert to that.

            NOTE FOR THE DEVELOPERS
            If it will be useful for the benefit of this project, I am willing to somehow upload my database or whatever files will be useful so developers can test performance of a large database, with many buckets, etc. As I said before, I am totally satisfied with accuracy and performance (in 0.22.5), everything is working absolutely perfectly for me (in 0.22.5). Let me know what I can do if you would like to use my installation for testing, etc.

            Cris

          • Message #855

            Hi

            I've tested with my test database (29,040KB SQLite 3.x database contains 469,238 distinct words) on Win XP SP3 and found the same problem.
            It took about 48 seconds to reclassify a message size of 19.5KB.

            It seems that there's two causes why reclassifying is very slow:

            1) The default value of the synchronous mode has been changed in SQLite 3.x

            I've googled and found these messages:

            http://www.mail-archive.com/sqlite-users@sqlite.org/msg08061.html
            http://www.mail-archive.com/sqlite-users@sqlite.org/msg08065.html

            2) Counting words may be very slow in SQLite 3.x

            I don't know the reason, but it took over 10 seconds on my database.
            When reclassifying, POPFile re-counts words in each bucket and caches them.

            I've committed three patches to ease the problem: [3535][3536][3537]

            Now it takes about 17 seconds to reclassify the above message...,
            faster than v1.1.1 RC1 but still slow.

            I'll try to find out more efficient solution.

            Naoki

            • Message #856

              It took about 48 seconds to reclassify a message ... Now it takes about 17 seconds

              Thanks for the patches Naoki. I get a similar improvement on my system with a large database.

              POPFile 0.22.5 takes about 7 seconds to reclassify a message
              POPFile 1.1.1 took about 26 seconds before your patches
              POPFile 1.1.1 with patches [3535], [3536] and [3537] now takes 9 seconds.

              So with your patches 1.1.1 is now almost as fast as 0.22.5.

              For these tests I used my full installation of ActivePerl? 5.8.9 build 826 to run popfile.pl (instead of using the minimal Perl normally used by POPFile).

              Brian

              • Message #857

                So with your patches 1.1.1 is now almost as fast as 0.22.5.

                I've run some more tests, still using my full Perl installation. This time I restored the database and message history before doing any tests. The same pair of messages were reclassified (one at a time):

                0.22.5 - 99.98 KB message - 9 seconds
                0.22.5 - 939 byte message - 7 seconds

                1.1.1+patches - 99.98 KB message - 9 seconds
                1.1.1+patches - 939 byte message - 8 seconds

                These times came from the level 2 log so it seems there is not really much difference between 0.22.5 and the current SVN code.

                Brian

                • Message #858

                  Brian, thank you for testing new patches.
                  I'm glad to hear that my patches are effective.

                  Now it takes about 17 seconds to reclassify the above message...,
                  faster than v1.1.1 RC1 but still slow.

                  I've committed another patch to gain more performance: [3538]
                  POPFile will update only the necessary part of the database cache.

                  Now, it takes 4 seconds to reclassify a message size of 19.5KB.
                  And creating/deleting/renaming buckets will also become faster than v1.1.0.

                  I'm testing the patches on the minimal Perl which is included in the Windows version of RC1.

                  Is it time to release v1.1.1 RC2?

                  Naoki

                  • Message #859

                    Now, it takes 4 seconds to reclassify a message size of 19.5KB.

                    I have rerun my tests and got these results:

                    Time to reclassify a 99.8 KB message (slot 2757):

                    0.22.5 = 9 seconds
                    1.1.1 = 3 seconds

                    Time to reclassify a 939 byte message (slot 2767):

                    0.22.5 = 7 seconds
                    1.1.1 = 1 second

                    Is it time to release v1.1.1 RC2?

                    Yes, I think it is (though perhaps it should now be 1.2.0 if it is going to be about 10 times faster with large databases?)

                    Brian

                    • Message #860

                      Hi Brian,

                      Thanks for more testings.

                      Is it time to release v1.1.1 RC2?


                      Yes, I think it is (though perhaps it should now be 1.2.0 if it is going to be about 10 times faster with large databases?)

                      I think most of users can't see any difference. This may be just a minor improvement.
                      So I like to call the next version v1.1.1.

                      I've updated the release notes for RC2:

                      http://getpopfile.org/docs/releasenotes:1.1.1

                      And Japanese version is here:

                      http://idisk.mac.com/amatubu/Public/POPFile/v1.1.1/v1.1.1.change.nihongo

                      Naoki

                      • Message #861

                        So I like to call the next version v1.1.1.

                        OK, I'll download your revised release notes and send you an email once I've built RC2

                        Brian

                        • Message #863

                          The new release candidate (1.1.1 RC2) is now available in cross-platform, Windows and Max OS X verisons as announced on our homepage

                          Brian

                          • Message #866

                            ABSOLUTELY BRILIANT!!!

                            Thanks to all for the fix!
                            I am the one who started this thread. Just tried 1.1.1 RC2 (Windows). Reclassification speed is now vastly improved (even faster than 0.22.5). Before patch: 47 seconds. Now: instant (maybe 1 second or less).

                            Anazing! Thanks!

                            Cris

                            • Message #870

                              D@mn, it's fast now!! Thanks guys.

                              • Message #871

                                Hi, Cris and Adrian

                                Thanks for your reports. Glad to hear that!

                                Naoki

                                • Message #873

                                  No problem. Thanks for providing this in the first place!