Re: Parsing of comments in LWP
Chet Murthy (chet@watson.ibm.com)
Tue, 14 May 1996 17:18:15 -0400
Gisle Aas writes:
::In message <9605132159.AA18514@lusitania.watson.ibm.com>, Chet Murthy writes:
::> I decided to modify the HTML parser to preserve comment text. I
::> thought I'd write down my strategy, in the hopes of getting whatever
::> feedback might be forthcoming.
::>
::> The idea I have is to create a new node, with tag "!", witha single
::> attribute, "TEXT". Then, the "starttag" function would emit the text
::> directly for this node.
::>
::> Any function which needed to ignore comments could just skip this
::> node.
::>
::> How does that sound? To me, it's a bit baroque, but I can't think of
::> any other way to preserve comments, while treating them as
::> non-textual.
::
::How would you treat this:
::
:: <img src="xxx" <!-- a nice picture -->
:: alt="xxx" <!-- and an alternative text -->
:: >
::
::or is it this really valid HTML?
::
:: <a href="xxx" <!-- comment inside the start tag -->>
:: <!-- standard comment --> foo
:: </ <!-- comment inside the end tag--> a>
I tested your examples with Dan Connolly's sgml lexer, and rejected
them, with the output included below (for the first -- I leave out the
output from the second example). It is not clear to me whether or not
these examples are valid SGML.
I think I like the idea of adding a new element class for comments,
and will do this.
By the way, my reason for wanting to preserve comments is so I can use
LWP as a library for HTML manipulation. I want to be able to
manipulate it, and then let users re-edit it, ad libitum. Obviously,
the output must be as close as possible to the original.
Cheers,
--chet--
================================================================
line 1: [Tag/Data]
Data: ` '
line 1: [Err/Lim]
!!Limitation!!: `Unclosed tags not supported'
Data: `<'
line 1: [Tag/Data]
Start Tag: `<img '
Attr Name: `src='
Literal: `"xxx" '
line 2: [Tag/Data]
Data: `!-- a nice picture -->
alt="xxx" '
line 2: [Aux Markup]
Markup Decl: `<!'
Comment: `-- and an alternative text --'
Tag Close: `>'
line 4: [Tag/Data]
Data: `
>
'
================================================================