Re: HTML::Element::extract_links() vs. LinkExtor

Boris Statnikov (boris@blaze.cs.jhu.edu)
Sun, 12 Apr 1998 19:27:30 -0400 (EDT)


I've asked earlier about parsing out link text...  To be sure, the example
that Gisle quotes (for the reasons below, I suppose) does not parse this
(real-world and, I believe, perfectly valid) page:

--------------

Elli Angelopoulou's home page <body> <p>This web page uses frames, but your browser doesn't support them.</p> </body>

--------------

LinkExtor will parse the links if the tag that I specify is not "a" but
"frame", but it cannot parse out the text (it throws it away I suppose).
Any suggestions?

Boris


Too many cooks spoil the brouhaha.

	Harvard Lampoon, "Bored of The Rings"


On 12 Apr 1998, Gisle Aas wrote:

> Karl Hakimian <hakimian@aha.com> writes:
> 
> >   %isBodyElement = map { $_ => 1 } qw(h1 h2 h3 h4 h5 h6
> >   				    p div pre address blockquote
> >   				    xmp listing
> > ! 				    a img br hr frameset frame
> >   				    ol ul dir menu li
> >   				    dl dt dd
> >   				    cite code em kbd samp strong var dfn strike
> 
> I have problems with this because <frameset> and <frame> is
> definitively not something that should be inside a <body>.  The
> version of the HTML4.dtd that I have says:
> 
> <!ELEMENT FRAMESET - - ((FRAMESET|FRAME)+ & NOFRAMES?)>
> <!ELEMENT FRAME - O EMPTY>
> <!ELEMENT NOFRAMES - -
>  (#PCDATA,((BODY,#PCDATA)|
>            (((%blocklevel)|%font|%phrase|%special|%formctrl),%block)))>
> <!ENTITY % html.content "HEAD, (FRAMESET|BODY)">
> <!ELEMENT HTML O O (%html.content)>
> 
> i.e. a document contains a <HEAD> and then either a <FRAMESET> or a
> <BODY>.  The <FRAMESET> can contain a <NOFRAME> element that can
> contain a <BODY>.
> 
> -- 
> Gisle Aas
> 
>