[Q] Code problem using Link Extractor

Mike Grommet (mgrommet@insolwwb.net)
Wed, 3 Dec 1997 20:14:52 -0600


Hi guys... I am new to the list (and new to libwww also)

I am attempting to use Link Extractor to pull links from a url,
then absoluteize them to convert from relative links, and then keep
the links which are at the same base as the url I am checking...

Suppose I am checking http://www.insolwwb.net
so basically I want all pages that look like http://www.insolwwb.net/*
but not http://www.microsoft.com

Here is my call back function for link extractor so far:

---------- SNIP ---------------
sub picklinks
{
  my($tag, %attr) = @_;
  return if $tag ne 'a';            #return if not anchor tag
  @value = values(%attr);    #get the url info
  $val = $value[0];                 #@value is 1 element array... get the
value
  $val = url($val,$base)->abs; #absolutize the url
  push(@links,$val);                 #push it on the @links array
}

--------- SNIP -------------

This code seems to work fine for absolutizing the links before
it pushes them on the links array...

$base is a global value of the base of the url we are checking.


Now how do I make it so I chunk out the urls that are not under the same
base?

BTW is there a better way to do this?  I am very open to criticism
on this code (like I said, I am very very new to libwww)