Ticket #182 (new defect)

Opened 4 years ago

Last modified 2 years ago

POPfile failed start due to failure to copy freed scalar in MQ.pm

Reported by: ishikawa@yk.rim.or.jp Assigned to:
Priority: normal Milestone:
Component: Database Version: 1.1.3
Severity: normal Keywords:
Cc:

Description

System: POPfile v1.1.3 under Debian GNU/Linux (32bit) [I installed my own copy of POPfile.]

Problem: A transient(?) error seems to have failed the launch of a new instance of POPfile.

I show the session log below.

The script to run POPfile, bin/run-popfile.sh contains the following environnment variables, etc.


:
#
PD=/extra/ishikawa/download/POPFILE-DIR
#PD=/new-hd1/extra/ishikawa/download/POPFILE-DIR
#PD=/tmp/usb-old-extra/ishikawa/download/POPFILE-DIR

POPFILE_USER=$PD/
POPFILE_ROOT=$PD/
export POPFILE_USER POPFILE_ROOT
perl $PD/popfile.pl \
     --set html_language=Nihongo \
     --set html_local=0 \
     --set html_port=9090 \
     --set pop3_port=1110

--- end quote ---

--- Start Session Log: with a comment line

debian-vbox-ci:/home/ishikawa# bin/run-popfile.sh


POPFile Engine loading

    Loading... 
         {core: config mq logger history} 
         {classifier: bayes wordmangle} 
         {interface: xmlrpc html} 
         {proxy: pop3 nntp smtp} 
         {services: imap} 

POPFile Engine v1.1.3 starting

    Initializing... 
         {core: config history logger mq} 
         {classifier: bayes wordmangle} 
         {interface: html xmlrpc} 
         {proxy: nntp pop3 smtp} 
         {services: imap} 

    Starting...     
         {core:

A copy of POPFile appears to be running.
 Attempting to signal the previous copy.
 Waiting 10 seconds for a reply.

The other POPFile (4393) failed to signal back, starting new copy (32396)
 config history logger mq} 
         {classifier: bayes wordmangle} 
         {interface: html} 
         {proxy: pop3} 
         {services:} 

POPFile Engine v1.1.3 running
panic: attempt to copy freed scalar 9c71e0c to 982c1d0 at /extra/ishikawa/download/POPFILE-DIR//POPFile/MQ.pm line 330.

     **** The above is the error line ??? *****

POPFile Engine v1.1.3 stopping

    Stopping... 
         {classifier: bayes wordmangle} 
         {core: config history logger mq} 
         {interface: html} 
         {proxy: pop3} 
         {services:} 

POPFile Engine v1.1.3 terminated
debian-vbox-ci:/home/ishikawa# bin/run-popfile.sh

POPFile Engine loading

Note the error on line 330 of MQ.pm /extra/ishikawa/download/POPFILE-DIR//POPFile/MQ.pm line 330.

Failure to run popfile when the previous copy is deemed non-existent doesn't happen very often. (I think I see this particular bug for the first time.)

It is in read_pipe. My guess is that there was some transitory error or something. But I am not sure.

Just thought to report this bug for someone in the know to take pre-cautionary measure (like the pipe cache? is being removed before use, the child process disappearing before having a chance to wait for it, etc.???).

265 #----------------------------------------------------------------------------
266 #
267 # read_pipe_
268 #
269 # reads a single message from a pipe in a cross-platform way.
270 # returns undef if the pipe has no message
271 #
272 # $handle   The handle of the pipe to read
273 #
274 #----------------------------------------------------------------------------
275 sub read_pipe_
276 {
277     my ( $self, $handle ) = @_;
278 
279     if ( $^O eq "MSWin32" ) {
280 
281         # bypasses bug in -s $pipe under ActivePerl
282 
283         my $message;         # PROFILE PLATFORM START MSWin32
284 
285         if ( &{ $self->{pipeready_} }($handle) ) {
286 
287             # add data to the pipe cache whenever the pipe is ready
288 
289             sysread($handle, my $string, -s $handle);
290 
291             # push messages onto the end of our cache
292 
293             $self->{pipe_cache__} .= $string;
294         }
295 
296         # pop the oldest message;
297 
298         $message = $1 if (defined($self->{pipe_cache__}) &&           # PROFILE BLOCK START
299                           ( $self->{pipe_cache__} =~ s/(.*?\n)// ) ); # PROFILE BLOCK STOP
300 
301         return $message;        # PROFILE PLATFORM STOP
302     } else {
303 
304         # do things normally
305 
306         if ( &{ $self->{pipeready_} }($handle) ) {
307             return <$handle>;
308         }
309     }
310 
311     return undef;
312 }
313 

TIA

Change History

02/28/13 13:36:12 changed by amatubu

Hi,

Thanks for your report.

The line 330 of MQ.pm is in 'flush_child_data_' function:

http://getpopfile.org/browser/tags/v1_1_3/engine/POPFile/MQ.pm#L330

    my ( $self, $handle ) = @_;

This code just copies values of the parameter and I have no idea why the error occurs.

panic: attempt to copy freed scalar 9c71e0c to 982c1d0

This seems to be an internal error of Perl. I think this might be a bug of Perl itself or some Perl module which POPFile depends on.

What version of Perl are you using?

Naoki

03/01/13 10:18:02 changed by ishikawa@yk.rim.or.jp

Hi,

Thank you for your time to look at the report, and am sorry I missed the correct uploading of the problematic lines. (I have no idea why I ended up uploading only the lines preceding the 330th-line and not include it. Oh well.)

Anyway, I found that I am using perl 5.14.2 under Debian GNU/Linux:

perl --version

This is perl 5, version 14, subversion 2 (v5.14.2) built for i486-linux-gnu-thread-multi-64int
(with 75 registered patches, see perl -V for more detail)

Copyright 1987-2011, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

ishikawa@debian-vbox-ci:~/bin$ 

My observation about "(like the pipe cache? is being removed before use, the child process disappearing before having a chance to wait for it, etc.???). " because the parameters passed to flush_child_data seems to be some kind of data related to the child process or the data in pipe from cursory reading of the code.

grep -n  -e flush_child /extra/ishikawa/download/POPFILE-DIR/POPFile/*.pm /dev/null
/extra/ishikawa/download/POPFILE-DIR/POPFile/MQ.pm:118:        $self->flush_child_data_( $self->{children__}{$kid} );
/extra/ishikawa/download/POPFILE-DIR/POPFile/MQ.pm:178:        $self->flush_child_data_( $pipe )
/extra/ishikawa/download/POPFILE-DIR/POPFile/MQ.pm:255:                $self->flush_child_data_( $self->{children__}{$kid} );
/extra/ishikawa/download/POPFILE-DIR/POPFile/MQ.pm:316:# flush_child_data_
/extra/ishikawa/download/POPFILE-DIR/POPFile/MQ.pm:328:sub flush_child_data_

So maybe by the time flush_child_data tries to access the parameter, the underlying data silently disappeared (?). I am not sure how perl handles such system related data that changes its status dynamically. Maybe this is like referencing an object through a pointer AFTER the referenced object is freed immediately before the usage?

Thank you again for your time.

03/22/13 21:11:48 changed by craigwest

I am having exactly the same problem.

POPFile running on my Ubuntu 12.04 LTS server

Error in POPFile log:

2013/3/22 11:03:52 1059: Perl fatal error : panic: attempt to copy freed scalar 205ffe0 to 185b4b8 at /usr/share/popfile/POPFile/MQ.pm line 330.[0a]

perl version information:

This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 55 registered patches, see perl -V for more detail)

03/23/13 09:15:17 changed by ishikawa@yk.rim.or.jp

Hi,

So I think this problem is caused by some race condition between the main POPfile process, whatever children process / pipes that are created by it, and the interaction. The problem caused by this is very rare, but if you have 100,000 of users or more, it will eventually shows up on someone's computer, I suppose.

Sorry, I am not able to figure out the issue for now, but now it seems it can be repeatably reproduced under carefully managed test environment by knowledgeable someone.

TIA

10/26/13 21:22:00 changed by amatubu

Hi,

Sorry for late reply.

To find the cause, I've written a patch to record more details (caller functions) when POPFile crashes with a fatal error.

This change will be included the next version (perhaps v1.1.4). I hope this patch help us to find out the cause.

Naoki

06/30/15 21:35:16 changed by andy

I'm getting the same error on Ubuntu 14.04. It's rare but regular (once a month). I do not see v1.1.4 in SVN, but I could try to patch my installation if you can provide a patch (the one with more diagnostics).

Thanks, Andy

07/05/15 13:01:06 changed by brian

I do not see v1.1.4 in SVN.

An SVN tag is only created when a new version is released. The "Browse Source" link at the top of the page can be used to display the available tags: http://getpopfile.org/browser/tags

The current state of POPFile 1.1.4 development can be found in the "b0_22_2" branch (the "trunk" is being used for POPFile 2.x development): http://getpopfile.org/browser/branches/b0_22_2

I could try to patch my installation if you can provide a patch (the one with more diagnostics).

Naoki referred to a patch to record more details (caller functions) when POPFile crashes with a fatal error on 26 October 2013. I think the relevant commit was changeset:3850 which only modified one file (Loader.pm).

Since then the only change to Loader.pm has been to change the copyright date.

The on-line manual has some information about obtaining the source code: http://getpopfile.org/docs/faq:getsourcecode

If you don't want to use an SVN client you can use the download link in the source code browser (as described in the on-line manual) to get the most recent version of Loader.pm: http://getpopfile.org/browser/branches/b0_22_2/engine/POPFile/Loader.pm

Shut down POPFile, rename the existing file (e.g. Loader.pm.old), copy the new Loader.pm file to the relevant folder and restart POPFile.

07/13/15 16:04:00 changed by andy

I patched Loader.pm, here is the log from the first crash that I saw today:

2015/7/13 00:39:43 32035: mq: 259: Done with 16172 (0 to go)
2015/7/13 00:39:49 16176: bayes: 811: Attempting to connect to dbi:SQLite:dbname=/var/lib/popfile/popfile.db (1)
2015/7/13 00:39:49 16176: bayes: 912: Using SQLite library version 3.8.2
2015/7/13 00:39:49 16176: bayes: 512: Performing tweak 1 to 1
2015/7/13 00:39:49 16176: pop3: 311: +OK POP3 POPFile (v1.1.3) server ready[0d][0a]
2015/7/13 00:39:49 16176: pop3: 311: -ERR No secure server specified[0d][0a]
2015/7/13 00:39:49 16176: pop3: 682: POP3 proxy done
2015/7/13 00:39:49 32035: mq: 259: Done with 16176 (0 to go)
2015/7/13 00:39:49 32035: Perl fatal error : panic: attempt to copy freed scalar 2f4c7b0 to 24f6f78 at /usr/share/popfile/POPFile/MQ.pm line 330.[0a]
2015/7/13 00:39:49 32035:   called from POPFile::Loader /usr/share/popfile/POPFile/Loader.pm 135 POPFile::Loader::CORE_die 1    0
2015/7/13 00:39:49 32035:   called from POPFile::MQ /usr/share/popfile/POPFile/MQ.pm 330 POPFile::Loader::__ANON__ 1    1766 UUUUUUUUUUUUUUU[05]
2015/7/13 00:39:49 32035:   called from POPFile::MQ /usr/share/popfile/POPFile/MQ.pm 118 POPFile::MQ::flush_child_data_ 1    1766 UUUUUUUUUUUUUUU[05]
2015/7/13 00:39:49 32035:   called from POPFile::Loader /usr/share/popfile/POPFile/Loader.pm 796 POPFile::MQ::service 1    256
2015/7/13 00:39:49 32035:   called from main /usr/share/popfile/popfile.pl 118 POPFile::Loader::CORE_service 1    2022
2015/7/13 00:39:49 32035: POPFile stopped

07/16/15 01:59:37 changed by andy

One more crash today , essentially the same:

2015/7/15 15:26:13 29134: mq: 259: Done with 19527 (0 to go)
2015/7/15 15:28:12 19645: bayes: 811: Attempting to connect to dbi:SQLite:dbname=/var/lib/popfile/popfile.db (1)
2015/7/15 15:28:12 19645: bayes: 912: Using SQLite library version 3.8.2
2015/7/15 15:28:12 19645: bayes: 512: Performing tweak 1 to 1
2015/7/15 15:28:12 19645: pop3: 311: +OK POP3 POPFile (v1.1.3) server ready[0d][0a]
2015/7/15 15:28:12 19645: pop3: 311: -ERR No secure server specified[0d][0a]
2015/7/15 15:28:12 19645: pop3: 682: POP3 proxy done
2015/7/15 15:28:12 29134: mq: 259: Done with 19645 (0 to go)
2015/7/15 15:28:12 29134: Perl fatal error : panic: attempt to copy freed scalar 3483bf0 to 2896058 at /usr/share/popfile/POPFile/MQ.pm line 330.[0a]
2015/7/15 15:28:12 29134:   called from POPFile::Loader /usr/share/popfile/POPFile/Loader.pm 135 POPFile::Loader::CORE_die 1    0
2015/7/15 15:28:12 29134:   called from POPFile::MQ /usr/share/popfile/POPFile/MQ.pm 330 POPFile::Loader::__ANON__ 1    1766 UUUUUUUUUUUUUUU[05]
2015/7/15 15:28:12 29134:   called from POPFile::MQ /usr/share/popfile/POPFile/MQ.pm 118 POPFile::MQ::flush_child_data_ 1    1766 UUUUUUUUUUUUUUU[05]
2015/7/15 15:28:12 29134:   called from POPFile::Loader /usr/share/popfile/POPFile/Loader.pm 796 POPFile::MQ::service 1    256
2015/7/15 15:28:12 29134:   called from main /usr/share/popfile/popfile.pl 118 POPFile::Loader::CORE_service 1    2022
2015/7/15 15:28:12 29134: POPFile stopped