Re: HTML::Parser 3.13 Bug ?
Gisle Aas (gisle@activestate.com)
08 Jan 2001 11:52:50 -0800
Christian Recktenwald <chris@citecs.de> writes:
> I just recognized that "<p><b> Text </b></p>" seems
> to be not parsed correctly, while "<p> <b> Text </b> </p>" is.
>
> More precisely, the start event for <p> and the end event for </b>
> are not generated.
It is. It's just that you only print it out when there is a "text"
event. That make your code fail unless there is text after each start
tag. Whitespace is text.
Regards,
Gisle
> -- check.pl ------
> #!/usr/bin/perl
> # some html pretty printer
>
> use HTML::Parser;
>
> $indent = -1;
> $indentstr = " ";
>
> $p = HTML::Parser->new(api_version => 3);
> $p->xml_mode(1);
>
> $p->handler(start => \&start_handler, 'tagname,self' );
> $p->handler(end => \&end_handler, 'tagname,self' );
>
> sub start_handler {
> my $tag = shift;
> my $self = $self;
Try to print "$tag" here to verify that the right thing actually
happens.
> shift->handler(text => sub { $indent ++;
> my $text = shift;
> my $ind = $indentstr x $indent;
> chomp($text);
> $text =~ s/^\s*//gs;
> $text =~ s/\r?\n/ /gs;
> print $ind, "<$tag>","\n",
> ($text ne "" )?($ind." $text"."\n"):(""); },
> "dtext");
> }
>
> sub end_handler {
> my $tag = shift;
> my $self = shift;
> $self->handler(text => sub { print $indentstr x $indent , "</$tag>\n";
> $indent --;
> });
> }
>
> $p->parse_file("test.html");
>
> ------------------
> -- test.html -----
> <HTML>
> <HEAD>
> <TITLE>
> text
> </TITLE>
> </HEAD>
> <BODY BGCOLOR=#ffffff>
> <H1> ue1 </H1>
> text
> <H2> ue2 </H2>
> text
> <H2> ue3 </H2>
> text
> <p><b> text1 </b></p>
> <p> <b> text2 </b> </p>
>
> </BODY>
> </HTML>