Re: HTML::Parse and comment removal

Chris Fedde (cfedde@loupe.ezsrc.mrg.uswest.com)
Thu, 28 Sep 1995 14:35:00 -0600


Gisle,

In message <199509281214.NAA06533@bergen.oslonett.no>you write:
>> Why does the HTML::Parse module throw away comments?
>
>Because they are comments!  Comments should not have any semantic meaning.
>

My intention in using HTML::Parse was to re-write an html so that it includes
standard header/footer markup.  Historically we have used server side
includes to encapsulate these headers/footers in a file that
can be changed without having to touch all the content files. 

I agree with you that the server side mechanism is a rude violation
of the meaning of comments to HTML.  Still I am left with the problem.
How do I validate a large document base?  My intention was to use
the callback from HTML::Element::traverse to perform my validation.
Of course that was assuming that comments were preserved in the parse tree.

As an alternative to the current model where HTML::Parse::parse_html
returns a full tree. Perhaps a method named HTML::Parse::parse
could take the html raw text as input and use a callback to create
various output for each html "event".  In this way HTML::Parse::parse
would act as an interpreter of the html text and a variety of
different callback functions could be used to translate the html
into many different end products.  Then HTML::Parse::parse_html
becomes a wrapper around a call to HTML::Parse::parse with a callback
that uses HTML::Element::new and HTML::Element::pushContents to build
a HTML::Element tree.

Other callbacks could be written that watch for specific html
"events" and perform translations in their own way. Such tasks
might include: direct translation to alternative markup languages,
augmentation of existing html by injecting ID or ALT attributes
into existing tags, re-writing URI, Translating to graphic visualization
languages such as graph-vis or DaVinci, and my favorite, processing
HTML comments that contain meta-info for server side parsers and
such.

I'm not really sure that these thoughts are even appropriate to your vision
for the HTML:: class.  Please take them in the spirit that they are
offered.  I have nothing but the highest respect for these tools and have
found them much more cohesive than any other set that I have yet used.

Have a Good Day
chris