Re: HTML::LinkEx(trac)tor

Gisle Aas (aas@bergen.sn.no)
Tue, 09 Jul 1996 11:58:22 +0200


In message <199607090745.JAA00892@frigg.ub2.lu.se>, Hakan Ardo writes:
> What about the ancher text? Shouldent that one be extracted as well?

It's not as easy as just extracting things from the start tags.  The
anchor text can have it's own (complex) structure.  If you need it you
should probably build a syntax tree, using HTML::TreeBuilder.

>                                                                      What 
> also might be nice, but that should probably be a separate object, would
> be a more flexable extracter implemented in the same way, that allowed you
> to say specify a list of regexps connected to procedures, where the regexps
> specifyes which tags that specific procedure should be called with. That 
> would allow you to extract any imformation you are intrested in using this
> fast method. What do you say?

I say that they should probably make their own HTML::Parser subclass.
Subclassing is a natural way to use a OO-library.

  package MyParser; @ISA=qw(HTML::Parser);
  sub start {
	my($tag, $attr) = @_;
        if ($tag =~ /^h\d+$/) {
	    $self->header($tag, $attr);
        } elsif ($tag =~ /.../) {
            ...
        }
  }

Regards,
Gisle.