Re: HTML::Element::extract_links() vs. LinkExtor
Boris Statnikov (boris@blaze.cs.jhu.edu)
Sun, 12 Apr 1998 19:27:30 -0400 (EDT)
I've asked earlier about parsing out link text... To be sure, the example
that Gisle quotes (for the reasons below, I suppose) does not parse this
(real-world and, I believe, perfectly valid) page:
--------------
Elli Angelopoulou's home page
--------------
LinkExtor will parse the links if the tag that I specify is not "a" but
"frame", but it cannot parse out the text (it throws it away I suppose).
Any suggestions?
Boris
Too many cooks spoil the brouhaha.
Harvard Lampoon, "Bored of The Rings"
On 12 Apr 1998, Gisle Aas wrote:
> Karl Hakimian <hakimian@aha.com> writes:
>
> > %isBodyElement = map { $_ => 1 } qw(h1 h2 h3 h4 h5 h6
> > p div pre address blockquote
> > xmp listing
> > ! a img br hr frameset frame
> > ol ul dir menu li
> > dl dt dd
> > cite code em kbd samp strong var dfn strike
>
> I have problems with this because <frameset> and <frame> is
> definitively not something that should be inside a <body>. The
> version of the HTML4.dtd that I have says:
>
> <!ELEMENT FRAMESET - - ((FRAMESET|FRAME)+ & NOFRAMES?)>
> <!ELEMENT FRAME - O EMPTY>
> <!ELEMENT NOFRAMES - -
> (#PCDATA,((BODY,#PCDATA)|
> (((%blocklevel)|%font|%phrase|%special|%formctrl),%block)))>
> <!ENTITY % html.content "HEAD, (FRAMESET|BODY)">
> <!ELEMENT HTML O O (%html.content)>
>
> i.e. a document contains a <HEAD> and then either a <FRAMESET> or a
> <BODY>. The <FRAMESET> can contain a <NOFRAME> element that can
> contain a <BODY>.
>
> --
> Gisle Aas
>
>