[Q] Code problem using Link Extractor
Mike Grommet (mgrommet@insolwwb.net)
Wed, 3 Dec 1997 20:14:52 -0600
Hi guys... I am new to the list (and new to libwww also)
I am attempting to use Link Extractor to pull links from a url,
then absoluteize them to convert from relative links, and then keep
the links which are at the same base as the url I am checking...
Suppose I am checking http://www.insolwwb.net
so basically I want all pages that look like http://www.insolwwb.net/*
but not http://www.microsoft.com
Here is my call back function for link extractor so far:
---------- SNIP ---------------
sub picklinks
{
my($tag, %attr) = @_;
return if $tag ne 'a'; #return if not anchor tag
@value = values(%attr); #get the url info
$val = $value[0]; #@value is 1 element array... get the
value
$val = url($val,$base)->abs; #absolutize the url
push(@links,$val); #push it on the @links array
}
--------- SNIP -------------
This code seems to work fine for absolutizing the links before
it pushes them on the links array...
$base is a global value of the base of the url we are checking.
Now how do I make it so I chunk out the urls that are not under the same
base?
BTW is there a better way to do this? I am very open to criticism
on this code (like I said, I am very very new to libwww)