Re: possible bug in HTML::Parser comment handler

Dave (dave.olszewski@andover.net)
Fri, 12 Jan 2001 13:42:26 -0500 (EST)


	
I have solved the problem I was having thanks to the info here.  All I had
to do was pass is_cdata as an arg to the handler and only print if it was
false.  Thanks very much.  
	dave


On Fri, 12 Jan 2001, Sean M. Burke wrote:

> At 11:21 PM 2001-01-11 +0100, Bjoern Hoehrmann wrote:
> >At 15:28 11.01.01 -0500, you wrote:
> >>It seems that the parser is not properly detecting multi-line HTML
> >>comments.  I was trying to print out the dtext of a html document and
> >>noticed that comments kept showing up in the output.  Upon further
> >>examination, the single line comments were being ignored but ones like
> >>this:
> >>
> >><!--
> >>td {font-family: Arial,Geneva,Helvetica,sans-serif; color: #000000;}
> >>-->
> >
> >Well, the content model of the style element is CDATA, your "comments"
> >may look like comments but they are no comments in HTML and SGML
> >terms. That's not a bug.
> 
> I don't see what's wrong with that comment.
> 
> >From ISO 8879 Section 10.3 declares a "comment declaration" (yes, horrible
> term for it) as:
> 
>  comment declaration =
>   "<!",
>   (comment
>     (s | comment)*
>   )?
>   ">"
> 
>  comment =
>   "--",
>   SGML_character*
>   "--"
> 
> And in section 6.2.1, there's the explanation of "s":
> 
>  s = SPACE | RE | RS | SEPCHAR
>   and in the concrete syntax, that means [\x20\cm\cj\t]
> 
> And as to "SGML_character", section 9.2 basically says that aside from any
> characters that you go and reserve as being impermissible, anything is an
> SGML_character.  (I'm getting this from the /SGML Handbook/, which contains
> the full text of ISO 8879, plus annotation, etc.)
> 
> 
> So I don't see a problem with 
>   <!--
>   td {font-family: Arial,Geneva,Helvetica,sans-serif; color: #000000;}
>   -->
> 
> 
> 
> BTW, the XML spec's definition is even clearer, er, sort of:
> 
>    Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
> 
> To this they add:  "Note that the grammar does not allow a comment ending
> in --->. The following example is not well-formed: '<!-- B+, B, or B--->'".
>  I'm a bit unclear on whether this really falls out of the grammar, but
> anyway.
> 
> 
> --
> Sean M. Burke  sburke@cpan.org  http://www.spinn.net/~sburke/
> 
>