wwwbot.pl problem

Andrew Daviel (andrew@andrew.triumf.ca)
Thu, 23 Nov 1995 12:42:51 -0800 (PST)


(I send a request to libwww-perl-request just before my last message
to the list, so I might not be on yet. Please Cc any replies to me.)

I was  having trouble with wwwbot from the libwww-perl-0.40 library.
I continued to work on the problem after posting to the perl list.

It seems that botcache is not well enough defined, so that
a site with User-Agent: * Disallow / would kill subsequent GETs to a 
site that was previously in the cache. I have made a patch which adds the 
address to the cache, and fixes a couple of other odd cases, such as
where the address is not fully defined working within a domain,
and there are host names such as ypsun, ypsun2 etc. which would
become confused with the path count.

See ftp://andrew.triumf.ca/pub/wwwbot.patch

Andrew Daviel         email: advax@triumf.ca 
TRIUMF                voice: 604-222-7376 
4004 Wesbrook Mall    fax:   604-222-7307 
Vancouver BC          http://andrew.triumf.ca/~andrew 
Canada   V6T 2A3      49D14.7N 123D13.6W