HTML::Parser (HTML::LinkExtor) Broken?

Matthew Keller (keller57@potsdam.edu)
Sat, 06 Mar 1999 11:32:35 -0500


	
	HTML::Parser (or maybe HTML::LinkExtor) is *NOT* returning all of the
tags it finds inside of <MAP> tags. The code below is taken from the
HTML::LinkExtor POD, on how to extract links. When run, it connects to a
page I wrote that contains a client-side image map. 
	This returns all of the AREA elements marked with 'SHAPE="circle"', but
none of the 'rect' areas. This page has a total of *27* link elements,
but only *7* are returned, because 20 of them are AREA elements with
'SHAPE="rect"'
	I can reproduce these results on ANY client-side image map (I've tested
on over 30 different sites). Out of the below two AREA elements, only
the first one is treated as a link element.
	*ANY* assistance would be most appreciated.

-- Begin HTML Snippet --

<AREA SHAPE="circle" COORDS="582,149,51"
HREF="http://mattwork.potsdam.edu/friends.htm" ALT="My Friends">

<AREA SHAPE="rect" COORDS="3,401,198,440",
HREF="http://mattwork.potsdam.edu/Me/" ALT="Me Stuff">


-- End HTML Snippet --

-- Begin Perl Code --

#!/usr/local/bin/perl -w

use LWP::UserAgent;
  use HTML::LinkExtor;
  use URI::URL;

  $url = "http://mattwork.potsdam.edu/zippy.htm";  # for instance
  $ua = new LWP::UserAgent;

  # Set up a callback that collect image links
  my @imgs = ();
  sub callback {
     my($tag, %attr) = @_;
     #return if $tag ne 'img';  # we only look closer at <img ...>
     push(@imgs, values %attr);
  }

  # Make the parser.  Unfortunately, we don't know the base yet
  # (it might be diffent from $url)
  $p = HTML::LinkExtor->new(\&callback);

  # Request document and parse it as it arrives
  $res = $ua->request(HTTP::Request->new(GET => $url),
                      sub {$p->parse($_[0])});

  # Expand all image URLs to absolute ones
  my $base = $res->base;
  @imgs = map { $_ = url($_, $base)->abs; } @imgs;

  # Print them out
  print join("\n", @imgs), "\n";

-- End Perl Code --
	
-- 

             -> Matthew Keller <-
            Distributed Computing
             Windows/UNIX Support
              and Host Services
                 Kellas Hall
   State University of New York at Potsdam	
         http://mattwork.potsdam.edu/
-
     They wouldn't give you the time of day.
     They said you weren't a player.
     They wouldn't accept your calls.
     They are holding on line three.
-
 PGP Keys -
    http://mattwork.potsdam.edu/crypto/