Re: HEAD and GET fail on http://www.suck.com/suckreviews/nph-suckblast.cgi

Martijn Koster (m.koster@webcrawler.com)
Tue, 23 Apr 1996 10:31:05 -0700


At 2:47 AM 4/23/96, Nelson Minar wrote:

>For Perl 5.002, libwww-perl-5b11, I get this error:
>
>$ HEAD http://www.suck.com/suckreviews/nph-suckblast.cgi
>Unable to determine scheme for '1' at
>/usr/local/lib/perl5/site_perl/auto/URI/URL/_generic/abs.al line 12

Yeah, I saw that recently, on a different URL. Unfortunately I cannot recal
if this came up on the list before, so I had a look into it myself.

The problem is in HTTP::Response::base(), for some bizarre reason
$1 ends up being 1 when the regexp match fails (huh?) This is on
IRIX 5.3 using perl5.002.

I worked around it by doing the patch below, but suspect Gisle will
beat me to the list server with a better fix :-) In the meantime it
works for me...

>Is there a way to only load the first, say, 4k of a document? I want
>to try to make a quick guess as to what the content of a page is and
>bail out after just a little bit of read. Is there a callback trick?

You should be able to do that...

>Dumb perl question - is there an easy way to remove duplicates in a list?

Don't store them into a list in the first place, but use a hash?

Regards,

-- Martijn (yes, I'm still alive, just about :-)

*** Response.pm.orig    Tue Apr 23 10:19:21 1996
--- Response.pm Tue Apr 23 10:17:47 1996
***************
*** 116,126 ****
          # XXX: Should really use the HTML::Parse module to get this
          # right. The <BASE> tag could be commented out, which we are
          # not able to handle here.
!         $self->{'_content'} =~ /<\s*base\s+href=([^\s>]+)/i;
          $base = $1;
          if ($base) {
              $base =~ s/^(["'])(.*)\1$/$2/;  #" get rid of any quoting
              return $base;
          }
      }
      $base = $self->header('Base') unless $base;
--- 116,127 ----
        # XXX: Should really use the HTML::Parse module to get this
        # right. The <BASE> tag could be commented out, which we are
        # not able to handle here.
!       if ($self->{'_content'} =~ /<\s*base\s+href=([^\s>]+)/i) {
            $base = $1;
            if ($base) {
                $base =~ s/^(["'])(.*)\1$/$2/;  #" get rid of any quoting
                  return $base;
+           }
          }
      }
      $base = $self->header('Base') unless $base;

-- Martijn

Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html