#108 closed defect (fixed)
Reclassify is much slower in 1.1.0 than it is in 0.22.5
Reported by: | Brian Smith | Owned by: | Brian Smith |
---|---|---|---|
Priority: | high | Milestone: | 1.1.1 |
Component: | Database | Version: | 1.1.0 |
Severity: | normal | Keywords: | reclassify database |
Cc: |
Description (last modified by )
The Slower reclassifications topic in the HELP OPEN DISCUSSION forum complains that reclassify operations which used to take around 6 seconds with 0.22.5 now take around 47 seconds after upgrading to 1.1.0.
The SQLite database involved is large (0.22.5 size was 102,807 Kb, after the upgrade to 1.1.0 the size is 50,735 Kb)
I've confirmed this behaviour using some test data:
0.22.5 with 150 MB database performs reclassifications in around 9 seconds per message (tested using message sizes up to 99.9 KB)
After upgrading to 1.1.0 RC5 the reclassify times were around 19 seconds (similar size messages used)
After upgrading to current SVN (getting ready for 1.1.1 RC2) the time increased to over 26 seconds or even over 30 seconds. Changing from "delete" to "memory" for the SQLite journal mode reduced the times by only a few seconds (e.g. 24 instead of 26 seconds).
I have just noticed that the ADVANCED page in 1.1.1's UI still shows bayes_sqlite_tweaks even though the Option Reference page no longer lists this parameter. If I shutdown POPFile and delete this line from popfile.cfg it gets reinstated the next time I start POPFile. I thought these tweaks were all to do with tweaking SQLite 2.x - are any of them still relevant now that we are using SQLite 3.x?
[Edited to correct the name of the forum]
Change History (6)
comment:1 by , 14 years ago
Description: | modified (diff) |
---|
comment:2 by , 14 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:3 by , 14 years ago
Here are the results from a more controlled test; the database and message history were restored and then the same two messages were reclassified:
0.22.5 log extract:
2009/6/21 15:53:50 2620: bayes: 472: Reclassification from unclassified to regular 2009/6/21 15:53:51 2620: history: 404: Change slot classification of 2757 to regular 2009/6/21 15:54:00 2620: history: 1003: Base query is select XXX from ... ... 2009/6/21 15:54:29 2620: bayes: 472: Reclassification from unclassified to spam 2009/6/21 15:54:29 2620: history: 404: Change slot classification of 2767 to spam 2009/6/21 15:54:36 2620: history: 1003: Base query is select XXX from ...
1.1.1 with [3535], [3536] and [3537] patches:
2009/6/21 16:01:25 7992: bayes: 539: Reclassification from unclassified to regular 2009/6/21 16:01:25 7992: history: 411: Change slot classification of 2757 to regular 2009/6/21 16:01:34 7992: history: 1051: Base query is select XXX from ... ... 2009/6/21 16:02:11 7992: bayes: 539: Reclassification from unclassified to spam 2009/6/21 16:02:11 7992: history: 411: Change slot classification of 2767 to spam 2009/6/21 16:02:19 7992: history: 1051: Base query is select XXX from ...
Time to reclassify a 99.8 KB message (slot 2757):
0.22.5 = 9 seconds
1.1.1 = 9 seconds
Time to reclassify a 939 byte message (slot 2767):
0.22.5 = 7 seconds
1.1.1 = 8 seconds
So there is not much difference between 0.22.5 and the current 1.1.1 SVN code now.
comment:4 by , 14 years ago
Naoki's latest patches make 1.1.1 much faster than 0.22.5 now:
0.22.5 log extract:
2009/6/21 15:53:50 2620: bayes: 472: Reclassification from unclassified to regular 2009/6/21 15:53:51 2620: history: 404: Change slot classification of 2757 to regular 2009/6/21 15:54:00 2620: history: 1003: Base query is select XXX from ... ... 2009/6/21 15:54:29 2620: bayes: 472: Reclassification from unclassified to spam 2009/6/21 15:54:29 2620: history: 404: Change slot classification of 2767 to spam 2009/6/21 15:54:36 2620: history: 1003: Base query is select XXX from ...
1.1.1 with the [3538] & [3539] patches:
2009/6/21 17:46:18 2320: bayes: 540: Reclassification from unclassified to regular 2009/6/21 17:46:18 2320: history: 411: Change slot classification of 2757 to regular 2009/6/21 17:46:21 2320: history: 1051: Base query is select XXX from ... ... 2009/6/21 17:46:33 2320: bayes: 540: Reclassification from unclassified to spam 2009/6/21 17:46:33 2320: history: 411: Change slot classification of 2767 to spam 2009/6/21 17:46:34 2320: history: 1051: Base query is select XXX from ...
Time to reclassify a 99.8 KB message (slot 2757):
0.22.5 = 9 seconds
1.1.1 = 3 seconds
Time to reclassify a 939 byte message (slot 2767):
0.22.5 = 7 seconds
1.1.1 = 1 second
comment:5 by , 14 years ago
comment:6 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I've run some tests using ActivePerl 5.8.9 build 826 to run popfile.pl (instead of using the normal minimal Perl). Here are some extracts form my log with logger_level set to 2:
POPFile 0.22.5 code:
POPFile 1.1.1 code before Naoki's patches:
POPFile 1.1.1 code with patches [3535], [3536] and [3537] from Naoki:
So with these patches 1.1.1 is now almost as fast as 0.22.5.