Bugreport on comments
hellmuth@ira.uka.de
Wed, 3 Dec 1997 21:43:24 +0100 (MET)
Hi,
I stumbled on the following bug in Parser.pm ( at least up to libwww-5.16
which is current AFAIK). Comments with -- in the comment-text are not
parsed correctly because -- (without the ending '>') is parsed as
end-of-comment.
The following code demonstrates this. It should print ' ------- ', instead
it prints ' '.
I've also appended a possible bug fix as a context diff. It fixes the bug
by checking the existence of the closing > without deleting it.
Holger.
-------------------------------------------------
#!/usr/local/bin/perl
require Fing;
$chunk= 'text bla <!-- ------- --> more text bla';
$p = Fing->new();
$p->parse($chunk);
$p->eof;
--------------------------------------------------
package Fing;
require HTML::Parser;
@ISA= qw( HTML::Parser );
sub new {
my $type = shift;
my $self = HTML::Parser->new();
$self->netscape_buggy_comment(1);
return bless $self, $type;
}
sub comment { my $type =shift; my $c= shift; print "$c\n"; }
------------------------------------------------------
*** Parser.pm~ Fri Feb 21 10:32:14 1997
--- Parser.pm Wed Dec 3 21:18:34 1997
***************
*** 202,208 ****
$eaten .= $1;
$text .= $2;
# Look for end of comment
! if ($$buf =~ s|^((.*?)--)||s) {
$eaten .= $1;
push(@com, $2);
} else {
--- 202,208 ----
$eaten .= $1;
$text .= $2;
# Look for end of comment
! if ($$buf =~ s|^((.*?)--)>|>|s) {
$eaten .= $1;
push(@com, $2);
} else {