Re: HTML::LinkExtor and HTML::Tagset::linkElements

Gisle Aas (gisle@activestate.com)
09 Apr 2001 12:47:17 -0700


Bill Moseley <moseley@hank.org> writes:

> Any good tricks for specifying which tags to extract (push onto
> $self->{'links'}), other than filtering after the links that are returned
> with $p->links?
> 
> Might be nice to be able to pass in something to LinkExtor like an array of
> a subset of HTML::Tagset::linkElements tags, or my own linkElements hash
> reference.
> 
> Or does every use a callback instead of letting LinkExtor collect the links?

I don't think I would bother extending LinkExtor too much.  If you
know exactly what you want just use HTML::Parser directly to extract
your links:

  #!/usr/bin/perl -w

  my @links;

  use HTML::Parser 3.20;
  my $p = HTML::Parser->new(start_h => [\@links, '@{tagname,attr}'],
                       	    report_tags => [qw(a img)],
                           );
  $p->parse_file("index.html");

  # XXX do something with collected @links...
  use Data::Dump;
  Data::Dump::dump(@links);

Regards,
Gisle