Bug in HTML::Parser (API version 2) ?

Ian Miller (imiller@csd.abdn.ac.uk)
Tue, 27 Jun 2000 13:19:11 +0100


--ReaqsoxgOBHFXBhH
Content-Type: text/plain; charset=us-ascii


Hi, 

I've noticed some strange behaviour whilst using HTML::Parser 3.08 
with API version 2. I've whittled it down to the two attached example
programs -- one demonstrates the segfault that I get with API v2 and 
the other demonstrates the equivalent (?) API v3 program that works as 
expected. I also attach the HTML file that causes the segfault, and
the output that I get from the segfaulting program.

Is it a bug in HTML::Parser? I don't see how it can be... but I can't
think of anything else... or am I doing something very, very stupid?

I'm using perl, version 5.005_03 built for sun4-solaris.

cheers,
ian

-- 
+-------------------------------------+---------------------------------------+
| ian miller; research student;       | imiller@csd.abdn.ac.uk                |
| university of aberdeen, scotland.   | http://www.csd.abdn.ac.uk/~imiller/   |
+-------------------------------------+---------------------------------------+

--ReaqsoxgOBHFXBhH
Content-Type: application/x-perl
Content-Disposition: attachment; filename="bug.pl"

#!/usr/local/bin/perl -w
#
# bug.pl -- demonstrates segfault with API version 2 and unbroken_text.
#

##
## Main
##

use strict;
use HTML::Parser ();

my $p = Buggy->new( api_version   => 2,
		    unbroken_text => 1, ## must be 1 
		    );                  ## to reproduce bug

$p->parse_file("problem.html");

##
## Package Buggy
##

package Buggy;

use strict;
use base qw(HTML::Parser);

sub text {
    my ($self, $text) = @_;
    
    print STDERR "$text\n----- END CHUNK -----\n";
    @_ = split(/\W/, $text); ## this causes the segfault
}




--ReaqsoxgOBHFXBhH
Content-Type: application/x-perl
Content-Disposition: attachment; filename="nobug.pl"

#!/usr/local/bin/perl -w
#
# nobug.pl -- demonstrates absence of bug with API version 3
#

##
## Main
##

use strict;
use HTML::Parser ();

my $p = HTML::Parser->new( api_version   => 3,
			   unbroken_text => 1, 
			   text_h        => [ 'bug', "self, text" ]);

$p->parse_file("problem.html");

##
## HTML::Parser
##

package HTML::Parser;

sub bug {
    my ($self, $text) = @_;
    
    print STDERR "$text\n----- END CHUNK -----\n";
    @_ = split(/\W/, $text); ## no problem!
}


--ReaqsoxgOBHFXBhH
Content-Type: text/html
Content-Disposition: attachment; filename="problem.html"

World-Wide Web Access Statistics for cobweb.cord.edu

contents deleted by archive maintainer

--ReaqsoxgOBHFXBhH Content-Type: text/plain Content-Disposition: attachment; filename="output.txt" contents deleted by archive maintainer --ReaqsoxgOBHFXBhH--