Bug in HTML::Parser (API version 2) ?
Ian Miller (imiller@csd.abdn.ac.uk)
Tue, 27 Jun 2000 13:19:11 +0100
--ReaqsoxgOBHFXBhH
Content-Type: text/plain; charset=us-ascii
Hi,
I've noticed some strange behaviour whilst using HTML::Parser 3.08
with API version 2. I've whittled it down to the two attached example
programs -- one demonstrates the segfault that I get with API v2 and
the other demonstrates the equivalent (?) API v3 program that works as
expected. I also attach the HTML file that causes the segfault, and
the output that I get from the segfaulting program.
Is it a bug in HTML::Parser? I don't see how it can be... but I can't
think of anything else... or am I doing something very, very stupid?
I'm using perl, version 5.005_03 built for sun4-solaris.
cheers,
ian
--
+-------------------------------------+---------------------------------------+
| ian miller; research student; | imiller@csd.abdn.ac.uk |
| university of aberdeen, scotland. | http://www.csd.abdn.ac.uk/~imiller/ |
+-------------------------------------+---------------------------------------+
--ReaqsoxgOBHFXBhH
Content-Type: application/x-perl
Content-Disposition: attachment; filename="bug.pl"
#!/usr/local/bin/perl -w
#
# bug.pl -- demonstrates segfault with API version 2 and unbroken_text.
#
##
## Main
##
use strict;
use HTML::Parser ();
my $p = Buggy->new( api_version => 2,
unbroken_text => 1, ## must be 1
); ## to reproduce bug
$p->parse_file("problem.html");
##
## Package Buggy
##
package Buggy;
use strict;
use base qw(HTML::Parser);
sub text {
my ($self, $text) = @_;
print STDERR "$text\n----- END CHUNK -----\n";
@_ = split(/\W/, $text); ## this causes the segfault
}
--ReaqsoxgOBHFXBhH
Content-Type: application/x-perl
Content-Disposition: attachment; filename="nobug.pl"
#!/usr/local/bin/perl -w
#
# nobug.pl -- demonstrates absence of bug with API version 3
#
##
## Main
##
use strict;
use HTML::Parser ();
my $p = HTML::Parser->new( api_version => 3,
unbroken_text => 1,
text_h => [ 'bug', "self, text" ]);
$p->parse_file("problem.html");
##
## HTML::Parser
##
package HTML::Parser;
sub bug {
my ($self, $text) = @_;
print STDERR "$text\n----- END CHUNK -----\n";
@_ = split(/\W/, $text); ## no problem!
}
--ReaqsoxgOBHFXBhH
Content-Type: text/html
Content-Disposition: attachment; filename="problem.html"
World-Wide Web Access Statistics for cobweb.cord.edu
contents deleted by archive maintainer
--ReaqsoxgOBHFXBhH
Content-Type: text/plain
Content-Disposition: attachment; filename="output.txt"
contents deleted by archive maintainer
--ReaqsoxgOBHFXBhH--