Re: [Q] Code problem using Link Extractor
Gisle Aas (gisle@aas.no)
04 Dec 1997 15:54:27 +0100
Mike Grommet <mgrommet@insolwwb.net> writes:
> I am attempting to use Link Extractor to pull links from a url,
> then absoluteize them to convert from relative links, and then keep
> the links which are at the same base as the url I am checking...
>
> Suppose I am checking http://www.insolwwb.net
> so basically I want all pages that look like http://www.insolwwb.net/*
> but not http://www.microsoft.com
>
> Here is my call back function for link extractor so far:
>
> ---------- SNIP ---------------
> sub picklinks
> {
> my($tag, %attr) = @_;
> return if $tag ne 'a'; #return if not anchor tag
> @value = values(%attr); #get the url info
> $val = $value[0]; #@value is 1 element array... get the
> value
> $val = url($val,$base)->abs; #absolutize the url
> push(@links,$val); #push it on the @links array
> }
>
> --------- SNIP -------------
>
> This code seems to work fine for absolutizing the links before
> it pushes them on the links array...
>
> $base is a global value of the base of the url we are checking.
>
>
> Now how do I make it so I chunk out the urls that are not under the same
> base?
>
> BTW is there a better way to do this? I am very open to criticism
> on this code (like I said, I am very very new to libwww)
I would something like this:
sub picklinks
{
my($tag, %attr) = @_;
return if $tag ne 'a'; #return if not anchor tag
for my $url (values(%attr)) {
$url = url($url,$base)->abs;
next unless $url =~ /^\Q$base/o;
push(@links, $url);
}
}
--
Gisle Aas