Bus error in LWP/Perl5
Martijn Koster (mak@beach.webcrawler.com)
Thu, 19 Oct 1995 08:23:24 -0700
This also bus errors on my NeXT, with Perl5.001m...
Any other takers?
Nick, can you elaborate on the problems with IP addresses in URL's?
------- Forwarded Message
Return-Path: <nik@blueberry.co.uk>
Received: from surfs-up.demon.co.uk by webcrawler.com (NX5.67f2/NX3.0M)
id AA21966; Thu, 19 Oct 95 07:47:07 -0700
Received: (from nik@localhost) by elbereth.blueberry.co.uk (8.6.11/8.6.9) id PAA01274 for m.koster@webcrawler.com; Thu, 19 Oct 1995 15:47:32 GMT
From: Nik Clayton <nik@blueberry.co.uk>
Message-Id: <199510191547.PAA01274@elbereth.blueberry.co.uk>
Subject: Possible bug in LWP / Perl 5.001m
To: m.koster@webcrawler.com
Date: Thu, 19 Oct 1995 15:47:31 +0000 ()
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 2613
X-Filter: mailagent [version 3.0 PL41] for mak@surfski.webcrawler.com
Hi,
I think I may have uncovered a bug in either the LWP library, Perl 5 or
FreeBSD.
The enclosed script dumps core when run on the enclosed file. Perhaps
you could try it on your system and see what happens? This is on a
Pentium with Perl 5.001m and FreeBSD 2.0.5.
I've also noticed that the library seems to have problems with URLs that
include IP addresses. This is using libwww-perl-5b5 BTW.
Cheers, N
==== w3new.pl ====
#!/usr/local/bin/perl
#
# Given a file containing a list of URLs, outputs a new '|' delimited list
# showing whether the URL in question has been altered in the past 7 days,
# and if it has, the modification time and the title.
#
# Run as
#
# w3new filename [filename...]
#
# and optionally redirect the output.
#
# Copyright (c) 1995 Nik Clayton and Blueberry Design Ltd.
use LWP::UserAgent; # Perl WWW Library
use HTTP::Date; # Date parsing
$ua = new LWP::UserAgent; # New user agent
$week = (60 * 60 * 24 * 7); # Week in seconds
while(<>) {
chop;
next if /^\#/; # Ignore comments
print STDERR "Now examining: $_\n"; # Show where we are
# Do a HEAD request for the file in question. If it's new enough then
# GET the file to extract the TITLE from the document.
$request = new HTTP::Request('HEAD', $_);
$response = $ua->request($request);
if($response->isSuccess) {
$time = $response->header('Last-Modified');
if(str2time($time) + $week > $^T) {
print "New|$time|";
$request = new HTTP::Request('GET', $_);
$response = $ua->request($request);
$response->content =~ /<title>(.*?)<\/title>/i;
print "$1|$_\n";
} else {
print "Old|$_\n";
}
}
}
==== url-list ====
HTTP://198.102.242.71/ShuttleCAM/ShuttleCAM.cgi
http://140.109.40.248/~taob/
http://140.109.40.248/~taob/Bench/
http://144.174.145.14/LIVE/live.html
http://198.147.111.30/~werdna/
http://198.147.111.30/~werdna/fun.html
http://ESPNET.SportsZone.com
http://SLEEPY.USU.EDU/~slq9v/cslewis/index.html
http://TV1.com/
http://WWW.Stars.com/
http://Web.cgrg.ohio-state.edu/folkbook/
http://actor.cs.vt.edu/~wentz/index.html
http://ai.eecs.umich.edu/people/kennyp/sounds.html
http://akebono.stanford.edu/
http://alpha.mkn.co.uk/help/flower/info
http://anther.learning.cs.cmu.edu/priest.html
http://artaids.dcs.qmw.ac.uk:8001/entrance/entrance.html
====
- --
- --+=[ Blueberry Hill Blueberry Design ]=+--
- --+=[ http://www.blueberry.co.uk/ 1/9 Chelsea Harbour Design Centre, ]=+--
- --+=[ WebMaster@blueberry.co.uk London, England, SW10 0XE ]=+--
------- End of Forwarded Message