Re: How to expand all links.

Gisle Aas (aas@sn.no)
Fri, 01 Dec 1995 16:10:59 +0100


> I have just installed libwww-perl-5b6, and am looking for a way to print 
> out an html document with all the links expanded to the full URL (i.e. 
> blah.gif becomes http://www.site.com/~someone/blah.gif).  Is libwww the 
> package to use for doing this sort of thing?

Sure.

>                                               Has anyone tackled this 
> problem before?  Any pointers would be appreciated.

Check what "request -o links http://yourplace/" spits out...

If you want the URLs expanded in the HTML document text then this might help 
you:

------------------------------
#!/local/bin/perl -w

%linkElements = (
 'a'    => 'href',
 'img'  => 'src',
 'form' => 'action',
 'link' => 'href',
);

use HTML::Parse;
use URI::URL;

$BASE = "http://somewhere/root/";
$FILE = "yourfile.html";

$h = parse_htmlfile($FILE);
$h->traverse(\&expand_urls, 1);
print $h->asHTML;

sub expand_urls
{
   my($e, $start) = @_;
   return 1 unless $start;
   my $attr = $linkElements{$e->tag};
   return 1 unless defined $attr;
   my $url = $e->attr($attr);
   return 1 unless defined $url;
   $e->attr($attr, (new URI::URL $url, $BASE)->abs->as_string);
}
-------------------------------

This will reformat your HTML.  If this is not desireable then you might want 
to try match the relative links in the HTML with a regular expression and the 
expand them to absolute URLs with the:

  (new URI::URL $url, $BASE)->abs->as_string

construct.  A correct implementation of the above would also look for the 
<BASE href="URL"> element in the HTML document and use the specified URL as 
$BASE.

-- 
Gisle Aas                               <aas@oslonett.no>
Schibsted Nett AS                       http://www.oslonett.no/home/aas/