Bug in HTML::Parser (<p> inside table cell)

Marcin Kasperski (Marcin.Kasperski@softax.com.pl)
Thu, 29 Apr 1999 12:22:33 +0200


It seems, that the HTML::Parser does not allow to place <p> inside table
cell. The parsing results are then wrong.

Example HTML file:
--------------------------------------------------------
TEST
La la la

Le le le

--------------------------------------------------------

Dump of the tree created from this file:
--------------------------------------------------------
" " " " "TEST" " " " " <BODY> " " <P> <TABLE> " " <TR> " " <TD> " La la la " <P> " Le le le " <TABLE> <TD> " " " " " " -------------------------------------------------------- as you can see "Le le le" went to top level and then next table was started. The program I used to dump it: -------------------------------------------------------- #!/usr/bin/perl use HTML::TreeBuilder; foreach (@ARGV) { open FILE, "< $_"; my @contents = <FILE>; my $t = new HTML::TreeBuilder; $t->parse(join("", @contents)); $t->dump($_); } -------------------------------------------------------- I noticed that behaviour in packages I found in Debian hamm distribution, so I got the newest one from CPAN - they behave incorrectly too. The newest versions I used are HTML-Parser-2.22 and HTML-Tree-0.51. PS. I'm not member of libwww-perl mailing list. Please cc answers to me - I'm interested whether it is really a bug or I'm wrong... PS2. I tried to send this report to the package author. But his email seems to be not working (I got "no such user"). -- Marcin Kasperski Marcin.Kasperski<at>softax.com.pl -- marckasp<at>friko6.onet.pl -- Moje pogldy s moimi pogldami, nikogo poza mn nie reprezentuj. -- (My opinions are just my opinions.) </pre> <!-- body="end" --> <hr> <p> <ul> <!-- next="start" --> <li> <b>Next message:</b> <a href="0309.html">Henri Periat: "Re: [PATCH] Y2K Problem with HTTP::Date (libwww-5.42) (fwd)"</a> <li> <b>Previous message:</b> <a href="0307.html">Matthew Langlands: "Help please"</a> <!-- nextthread="start" --> <!-- reply="end" --> </ul>