Re: How to expand all links.
Gisle Aas (aas@sn.no)
Fri, 01 Dec 1995 16:10:59 +0100
> I have just installed libwww-perl-5b6, and am looking for a way to print
> out an html document with all the links expanded to the full URL (i.e.
> blah.gif becomes http://www.site.com/~someone/blah.gif). Is libwww the
> package to use for doing this sort of thing?
Sure.
> Has anyone tackled this
> problem before? Any pointers would be appreciated.
Check what "request -o links http://yourplace/" spits out...
If you want the URLs expanded in the HTML document text then this might help
you:
------------------------------
#!/local/bin/perl -w
%linkElements = (
'a' => 'href',
'img' => 'src',
'form' => 'action',
'link' => 'href',
);
use HTML::Parse;
use URI::URL;
$BASE = "http://somewhere/root/";
$FILE = "yourfile.html";
$h = parse_htmlfile($FILE);
$h->traverse(\&expand_urls, 1);
print $h->asHTML;
sub expand_urls
{
my($e, $start) = @_;
return 1 unless $start;
my $attr = $linkElements{$e->tag};
return 1 unless defined $attr;
my $url = $e->attr($attr);
return 1 unless defined $url;
$e->attr($attr, (new URI::URL $url, $BASE)->abs->as_string);
}
-------------------------------
This will reformat your HTML. If this is not desireable then you might want
to try match the relative links in the HTML with a regular expression and the
expand them to absolute URLs with the:
(new URI::URL $url, $BASE)->abs->as_string
construct. A correct implementation of the above would also look for the
<BASE href="URL"> element in the HTML document and use the specified URL as
$BASE.
--
Gisle Aas <aas@oslonett.no>
Schibsted Nett AS http://www.oslonett.no/home/aas/