Re: 2 RobotUA problems

Gisle Aas (aas@bergen.sn.no)
07 Dec 1996 09:47:08 +0100


Otis Gospodnetic <otisg@panther.middlebury.edu> writes:
> 1.
> 	I was GETting this URL:
> http://www.ibm.com/Links/http%3a%2f%2fwww.software.ibm.com/software/index_all.html
> 
> 	and I got this error:
> Path components contain '/' (you must call epath) at
> /home/otisg/lib/perl5/site_perl/LWP/RobotUA.pm line 188
> 
> 	OK, so the URL is probably bad, but is RobotUA supposed to break if it
> 	encounters a bad URL ?  This is LWP 5.04

True.  We need to fix it so that it calls $url->epath or
$url->path_components.  Thanks, for the report.

> 	I mailed this list how on one machine a script that uses RobotUA keeps
> 	asking for robots.txt over and over, even though I use AnyDBM_File.
> 	It turned out that the problem is not in the script, nor in that
> 	machine where machien was running nor the fault of the HTTP server the
> 	script was accessing.  Accessing other Web servers didn't show this
> 	problem, and accessing the same Web server from a different machien
> 	also didn't cause multiple fetches of robots.txt.
> 	So, I guess the problem happens only when I access that one particular
> 	server from that one particular machine.
> 	I figured out that when the script fetches robots.txt it assigns the
> 	rules a fresh_until date in past (less seconds than the amount of
> 	seconds at the moment of fetching).
> 	eg:
> 		No visits: 2
> 		Last visit: 849908242
> 		Fresh until: 849908022
> 	So it keeps refetching robots.txt and it is always about 219 seconds
> 	too 'young'
> 	I looked at RobotRules.pm but I haven't figured out how this could be
> 	fixed nor what's causing this.
> 	Maybe somebody else knows what's going on ?

Maybe the clock on this machine is not very accurate.  Perhaps it
thinks it is in another timezone?

Regards,
Gisle