Bus error in LWP/Perl5

Martijn Koster (mak@beach.webcrawler.com)
Thu, 19 Oct 1995 08:23:24 -0700


This also bus errors on my NeXT, with Perl5.001m...
Any other takers?

Nick, can you elaborate on the problems with IP addresses in URL's?

------- Forwarded Message

Return-Path: <nik@blueberry.co.uk>
Received: from surfs-up.demon.co.uk by webcrawler.com (NX5.67f2/NX3.0M)
	id AA21966; Thu, 19 Oct 95 07:47:07 -0700
Received: (from nik@localhost) by elbereth.blueberry.co.uk (8.6.11/8.6.9) id PAA01274 for m.koster@webcrawler.com; Thu, 19 Oct 1995 15:47:32 GMT
From: Nik Clayton <nik@blueberry.co.uk>
Message-Id: <199510191547.PAA01274@elbereth.blueberry.co.uk>
Subject: Possible bug in LWP / Perl 5.001m
To: m.koster@webcrawler.com
Date: Thu, 19 Oct 1995 15:47:31 +0000 ()
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 2613      
X-Filter: mailagent [version 3.0 PL41] for mak@surfski.webcrawler.com

Hi,

I think I may have uncovered a bug in either the LWP library, Perl 5 or
FreeBSD.

The enclosed script dumps core when run on the enclosed file. Perhaps
you could try it on your system and see what happens? This is on a
Pentium with Perl 5.001m and FreeBSD 2.0.5.

I've also noticed that the library seems to have problems with URLs that
include IP addresses. This is using libwww-perl-5b5 BTW.

Cheers, N

==== w3new.pl ====
#!/usr/local/bin/perl
#
# Given a file containing a list of URLs, outputs a new '|' delimited list
# showing whether the URL in question has been altered in the past 7 days,
# and if it has, the modification time and the title.
#
# Run as
#
#     w3new filename [filename...]
#
# and optionally redirect the output.
#
# Copyright (c) 1995 Nik Clayton and Blueberry Design Ltd.

use LWP::UserAgent;			     # Perl WWW Library
use HTTP::Date;				     # Date parsing

$ua = new LWP::UserAgent;		     # New user agent

$week = (60 * 60 * 24 * 7);		     # Week in seconds

while(<>) {
    chop;

    next if /^\#/;			     # Ignore comments
 
    print STDERR "Now examining: $_\n";	     # Show where we are

    # Do a HEAD request for the file in question. If it's new enough then
    # GET the file to extract the TITLE from the document.
    $request = new HTTP::Request('HEAD', $_); 
    $response = $ua->request($request);

    if($response->isSuccess) {
	$time = $response->header('Last-Modified');
	
	if(str2time($time) + $week > $^T) {
	    print "New|$time|";
	    $request = new HTTP::Request('GET', $_);
	    $response = $ua->request($request);
	    $response->content =~ /<title>(.*?)<\/title>/i;
	    print "$1|$_\n";
	    
	} else {
	    print "Old|$_\n";
	}
    }
}
==== url-list ====
HTTP://198.102.242.71/ShuttleCAM/ShuttleCAM.cgi
http://140.109.40.248/~taob/
http://140.109.40.248/~taob/Bench/
http://144.174.145.14/LIVE/live.html
http://198.147.111.30/~werdna/
http://198.147.111.30/~werdna/fun.html
http://ESPNET.SportsZone.com
http://SLEEPY.USU.EDU/~slq9v/cslewis/index.html
http://TV1.com/
http://WWW.Stars.com/
http://Web.cgrg.ohio-state.edu/folkbook/
http://actor.cs.vt.edu/~wentz/index.html
http://ai.eecs.umich.edu/people/kennyp/sounds.html
http://akebono.stanford.edu/
http://alpha.mkn.co.uk/help/flower/info
http://anther.learning.cs.cmu.edu/priest.html
http://artaids.dcs.qmw.ac.uk:8001/entrance/entrance.html
====

- -- 
- --+=[ Blueberry Hill                   Blueberry Design                   ]=+--
- --+=[ http://www.blueberry.co.uk/      1/9 Chelsea Harbour Design Centre, ]=+--
- --+=[ WebMaster@blueberry.co.uk        London, England, SW10 0XE          ]=+--

------- End of Forwarded Message