Re: 2 RobotUA problems
Gisle Aas (aas@bergen.sn.no)
07 Dec 1996 09:47:08 +0100
Otis Gospodnetic <otisg@panther.middlebury.edu> writes:
> 1.
> I was GETting this URL:
> http://www.ibm.com/Links/http%3a%2f%2fwww.software.ibm.com/software/index_all.html
>
> and I got this error:
> Path components contain '/' (you must call epath) at
> /home/otisg/lib/perl5/site_perl/LWP/RobotUA.pm line 188
>
> OK, so the URL is probably bad, but is RobotUA supposed to break if it
> encounters a bad URL ? This is LWP 5.04
True. We need to fix it so that it calls $url->epath or
$url->path_components. Thanks, for the report.
> I mailed this list how on one machine a script that uses RobotUA keeps
> asking for robots.txt over and over, even though I use AnyDBM_File.
> It turned out that the problem is not in the script, nor in that
> machine where machien was running nor the fault of the HTTP server the
> script was accessing. Accessing other Web servers didn't show this
> problem, and accessing the same Web server from a different machien
> also didn't cause multiple fetches of robots.txt.
> So, I guess the problem happens only when I access that one particular
> server from that one particular machine.
> I figured out that when the script fetches robots.txt it assigns the
> rules a fresh_until date in past (less seconds than the amount of
> seconds at the moment of fetching).
> eg:
> No visits: 2
> Last visit: 849908242
> Fresh until: 849908022
> So it keeps refetching robots.txt and it is always about 219 seconds
> too 'young'
> I looked at RobotRules.pm but I haven't figured out how this could be
> fixed nor what's causing this.
> Maybe somebody else knows what's going on ?
Maybe the clock on this machine is not very accurate. Perhaps it
thinks it is in another timezone?
Regards,
Gisle