Re: Parsing of comments in LWP

Chet Murthy (chet@watson.ibm.com)
Tue, 14 May 1996 17:18:15 -0400


Gisle Aas writes:
::In message <9605132159.AA18514@lusitania.watson.ibm.com>, Chet Murthy writes:
::> I decided to modify the HTML parser to preserve comment text.  I
::> thought I'd write down my strategy, in the hopes of getting whatever
::> feedback might be forthcoming.
::> 
::> The idea I have is to create a new node, with tag "!", witha single
::> attribute, "TEXT".  Then, the "starttag" function would emit the text
::> directly for this node.
::> 
::> Any function which needed to ignore comments could just skip this
::> node.
::> 
::> How does that sound?  To me, it's a bit baroque, but I can't think of
::> any other way to preserve comments, while treating them as
::> non-textual.
::
::How would you treat this:
::
::  <img  src="xxx" <!-- a nice picture -->
::        alt="xxx" <!-- and an alternative text -->
::  >
::
::or is it this really valid HTML?
::
::  <a href="xxx" <!-- comment inside the start tag -->>
::     <!-- standard comment --> foo
::  </ <!-- comment inside the end tag-->  a>

I tested your examples with Dan Connolly's sgml lexer, and rejected
them, with the output included below (for the first -- I leave out the
output from the second example).  It is not clear to me whether or not
these examples are valid SGML.

I think I like the idea of adding a new element class for comments,
and will do this.

By the way, my reason for wanting to preserve comments is so I can use
LWP as a library for HTML manipulation.  I want to be able to
manipulate it, and then let users re-edit it, ad libitum.  Obviously,
the output must be as close as possible to the original.

Cheers,
--chet--
================================================================

line 1: [Tag/Data]
Data: `  '

line 1: [Err/Lim]
!!Limitation!!: `Unclosed tags not supported'
  Data: `<'

line 1: [Tag/Data]
Start Tag: `<img  '
  Attr Name: `src='
  Literal: `"xxx" '

line 2: [Tag/Data]
Data: `!-- a nice picture -->
        alt="xxx" '

line 2: [Aux Markup]
Markup Decl: `<!'
  Comment: `-- and an alternative text --'
  Tag Close: `>'

line 4: [Tag/Data]
Data: `
  >
'
================================================================