Re: anyone know why extract_links() isn't working?

Nathaniel Good (good@cs.umn.edu)
Thu, 30 Oct 1997 09:13:07 -0600 (CST)


On Wed, 29 Oct 1997, Matt Silvia wrote:

> Hi there...
> 
> I'm having trouble getting the extract_links() method of
> HTML::TreeBuilder to work.
> 
> More specifically, I use a UserAgent to execute a request and return a
> response object, and then use HTML::Parse::parse_html to create a tree
> object.
> 
> I try to run the extract links method of this tree, but it seems as if
> it's not extracting anything from the tree.
> 
> Does anyone know what I'm doing wrong?
> 
> Thanks,
> 
>     Matt
> 
> ----
> example code, comments and declarations removed:
> 
> 
> $ua = new LWP::UserAgent;
> $ua->agent("AgentName/0.1 " . $ua->agent);
> 
> $URL = 'http://www.somethin.com/';
> 
> my $req = new HTTP::Request POST => $URL;
> $req->content_type('application/x-www-form-urlencoded');
> $req->content('');
> 
> my $response = $ua->request($req);
> $html = $response->content();
> $tree = HTML::Parse::parse_html($html);
> 
>     for (@{ $tree->extract_links( qw(a) ) }) {
>         $link = $_->[0];
>         print "$link\n";
>    }
This is what I use and it seems to work. $ARGV[0] is input from the
command line but I'm sure you could change it to a static URL and it
should work ok also. hope this helps.

#!/usr/local/bin/perl
use LWP::Simple;
use HTML::Parse;
use HTML::Element;

$html = get $ARGV[0];
$parsed_html = HTML::Parse::parse_html($html);

for (@{ $parsed_html->extract_links() }){

    $link = $_->[0];
    print "$link\n";
}








> 
>