Re: SGML Parser in Perl?

Jared_Rhine@hmc.edu
Tue, 13 Dec 1994 19:38:38 -0500


MK == Martijn Koster <m.koster@nexor.co.uk>

  MK> I have been considering "htmls": a html parser that will check and
  MK> rewrite HTML into different forms (like indented; smallest; fully
  MK> expanded etc). This would have the HTML DTD built-in; and not actually
  MK> parse it every time. This could mean a single C program is all you
  MK> need to verify and manage HTML (unlike the entire sgmls setup).

I don't think it should really have the DTD built-in because that would
prevent using different DTDs to describe HTML.  We will assuredly have
multiple DTDs in common use soon(ish).  It might make sense to be able to
pre-compile a DTD for efficiency, though.

  MK> Of course I'd like to prototype it in Perl5.

I'd be happy with a production version in Perl5, too.  I rarely do
validation on the fly, so I don't need that extra 30-50% that C might give.

Hmm, as a thought, I might really appreciate a Perl version, because I could
see hooking into such a parser to provide arbitrary functionality.  As long
as I'm parsing my HTML documents, I might as well update the databases of
LINK tags, run a link validator, and so forth.  I'd rather write the code
for these add-ons in Perl.  Heh, as long as you're using Perl 5, why not
just code up only the critical parts in C using Perl 5's extensibility
mechanisms?

-- 
Jared_Rhine@hmc.edu | Harvey Mudd College | http://www.hmc.edu/~jared/home.html

"To hear many religious people talk, one would think God created the
 torso, head, legs and arms, but the devil slapped on the genitals."
        -- Don Schrader