Re: SGML Parser in Perl?
Jared_Rhine@hmc.edu
Tue, 13 Dec 1994 19:38:38 -0500
MK == Martijn Koster <m.koster@nexor.co.uk>
MK> I have been considering "htmls": a html parser that will check and
MK> rewrite HTML into different forms (like indented; smallest; fully
MK> expanded etc). This would have the HTML DTD built-in; and not actually
MK> parse it every time. This could mean a single C program is all you
MK> need to verify and manage HTML (unlike the entire sgmls setup).
I don't think it should really have the DTD built-in because that would
prevent using different DTDs to describe HTML. We will assuredly have
multiple DTDs in common use soon(ish). It might make sense to be able to
pre-compile a DTD for efficiency, though.
MK> Of course I'd like to prototype it in Perl5.
I'd be happy with a production version in Perl5, too. I rarely do
validation on the fly, so I don't need that extra 30-50% that C might give.
Hmm, as a thought, I might really appreciate a Perl version, because I could
see hooking into such a parser to provide arbitrary functionality. As long
as I'm parsing my HTML documents, I might as well update the databases of
LINK tags, run a link validator, and so forth. I'd rather write the code
for these add-ons in Perl. Heh, as long as you're using Perl 5, why not
just code up only the critical parts in C using Perl 5's extensibility
mechanisms?
--
Jared_Rhine@hmc.edu | Harvey Mudd College | http://www.hmc.edu/~jared/home.html
"To hear many religious people talk, one would think God created the
torso, head, legs and arms, but the devil slapped on the genitals."
-- Don Schrader