Re: Bug in Parser (about comments)
Gisle Aas (aas@bergen.sn.no)
26 Jun 1997 14:36:12 +0000
Calle Aasman <md4calle@mdstud.chalmers.se> writes:
> On Thu, 26 Jun 1997, Gisle Aas wrote:
>
> > In message <Pine.GSO.3.95.970626155712.518D-100000@scooter.mdstud.chalmers.se>,
> > Calle Aasman writes:
> >
> > > another thing: it refuses to treat a tag like this:
> > > <font face="Arial", "Helvetica">
> > >
> > >
> > > as a font tag (or any tag at all), it just treats it as common text :/
> > > whole cnet got full of these tags so when I run the parser I get normal
> > > code all mixed up with fon-tags :/
> >
> > Does this syntax really work in Netscape and IE? Is common outside cnet?
> It doesn't change font in NN3.x for Solaris...but the document looks
> allright otherwise.
Looking at how Netscape colors the HTML source (in the View source
window) I conclude that it is parsing ',' and '"Helvetica"' as two
separate key-only attributes (unknown and therefore ignored).
> > The HTML ought to look like this:
> >
> > <font face="Arial, Helvetica">
> yup.
>
>
> > > any idea what it can be?
> >
> > As soon as the parser find the "," outside the quotes it concludes
> > that this can't possible be a start tag and it will treat it is normal
> > text.
> is there some way to make it treat everything inside the < > as a tag?
> I'm not interested in any info inside the font-tag, I just want to get rid
> of the tags.
There is not easy way. You could update the parser to treat ',' and
'"' as legal attribute name chars, but it is kind of silly.
You could also try to just filter out s/<[^>]+>// sequences in the
text before you decode entities or you could do a
s/"Arial",\s*"Helvetica"/"Arial, Helvetica"/g before you start parsing.
Or perhaps we could just send a mail to cnet telling them to fix their
HTML?
Regards,
Gisle