Re: libwww-perl-5b11 still lacks my parse patches
Andrew McRae (mcrae@internet.com)
Tue, 7 May 96 11:52:15 -0400
[ Nick Ing-Simmons said: ]
>It would seem that the SGML "DTD" for HTML provides a formal
>parsable _definition_ of which tags are allowed where.
Yes, it does. A DTD is a formal grammar, among other things.
> Would it therefore
>make sense for HTML::Parse to be auto-generated from the DTD - or perhaps
>even be an instance of an SGML::Parse?
A major problem with this is that anything based on an SGML parser would
only be able to handle a fraction of what you call '"real" documents':
>(A "real" document is one which other browsers handle "sensibly"...).
See, for example:
http://www.acl.lanl.gov/HTML_WG/html-wg-95q2.messages/0143.html
http://www.acl.lanl.gov/HTML_WG/html-wg-95q2.messages/0157.html
http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1224.html
et al.
The problems might be less severe if you're just trying to build a parse
tree, and not trying to render the document. And now that the most
popular Web browser makers have fixed one or two of the more egregious
errors in their parsing engines, the general quality of HTML out on the
Web may have improved.
Cheers,
Andrew.
--
Andrew McRae <mcrae@internet.com>
The Internet Company