Bugreport on comments

hellmuth@ira.uka.de
Wed, 3 Dec 1997 21:43:24 +0100 (MET)


Hi,
I stumbled on the following bug in Parser.pm ( at least up to libwww-5.16
which is current AFAIK). Comments with -- in the comment-text are not
parsed correctly because -- (without the ending '>') is parsed as 
end-of-comment. 
The following code demonstrates this. It should print ' ------- ', instead
it prints ' '.  
I've also appended a possible bug fix as a context diff. It fixes the bug 
by checking the existence of the closing > without deleting it.

						Holger.

-------------------------------------------------
#!/usr/local/bin/perl
 
require Fing;
 
$chunk= 'text bla  <!-- ------- --> more text bla';
$p = Fing->new();
$p->parse($chunk);
$p->eof;

--------------------------------------------------

package Fing;
 
require HTML::Parser;
@ISA= qw( HTML::Parser );
 
sub new {
    my $type = shift;
    my $self = HTML::Parser->new();
    $self->netscape_buggy_comment(1);
    return bless $self, $type;
}
 
sub comment { my $type =shift; my $c= shift; print "$c\n"; }

------------------------------------------------------

*** Parser.pm~  Fri Feb 21 10:32:14 1997
--- Parser.pm   Wed Dec  3 21:18:34 1997
***************
*** 202,208 ****
                $eaten .= $1;
                $text .= $2;
                # Look for end of comment
!               if ($$buf =~ s|^((.*?)--)||s) {
                    $eaten .= $1;
                    push(@com, $2);
                } else {
--- 202,208 ----
                $eaten .= $1;
                $text .= $2;
                # Look for end of comment
!               if ($$buf =~ s|^((.*?)--)>|>|s) {
                    $eaten .= $1;
                    push(@com, $2);
                } else {