wwwbot.pl problem
Andrew Daviel (andrew@andrew.triumf.ca)
Wed, 22 Nov 1995 18:16:07 -0800 (PST)
(from mail to bcutter, after I'd read a bit further in the documentation
:)= )
I'm trying to write a robot using libwww-perl-0.40 which I picked up a
while ago to run MOMspider. I just tried Archie; liege.ics.uci.edu
won't talk to me and I found the same version on anubis.ac.hmc.edu
that I have already. Is there an updated version?
I'm having trouble with wwwbot. What I think is happening is that if I
try a site that disallows everything (http://www.riddler.com/)
that when I try to get files from sites I visited before I went to
riddler, that don't even have a robots.txt, that I'm getting a 0 back
from wwwbot'allowed - so that the cache is corrupted somehow.
I tried calling dont_cache in testbot, but get the same results.
My robot goes round and round a list of URLs at different sites if it has
to wait, so I'm always getting this and killing a lot of perfectly good
links.
Andrew Daviel email: advax@triumf.ca
TRIUMF voice: 604-222-7376
4004 Wesbrook Mall fax: 604-222-7307
Vancouver BC http://andrew.triumf.ca/~andrew
Canada V6T 2A3 49D14.7N 123D13.6W