wwwbot bug - I don't get it
Fred Douglis (douglis@research.att.com)
Thu, 14 Sep 1995 16:42:30 -0400
This is a multipart MIME message.
--===_0_Thu_Sep_14_16:41:42_EDT_1995
Content-Type: text/plain; charset=us-ascii
I am using libwww-perl v0.40 and a modified version of Brooks's w3new
program. I found that hosts were inexplicably being flagged as
disallowing robots and eventually tracked this down, it seems, to a
problem where a single host that disallows robots causing future
checks on *other* hosts to fail.
I say "it seems" because I can't believe this is really the case --
it's too substantial a bug to slip through the cracks all this time,
if anyone is using wwwbot.pl at all. But then, I can't explain it any
other way.
So, I changed it to cache the hostname as well as the agent and
disallowed URLs. A patch follows.
--===_0_Thu_Sep_14_16:41:42_EDT_1995
Content-Type: application/x-patch
Content-Description: wwwbot.patch
*** wwwbot.pl Thu Sep 14 13:04:53 1995
--- lib/perl/libwww-perl-0.40/wwwbot.pl Thu Sep 14 13:15:08 1995
***************
*** 214,223 ****
for ($ua,'*')
{
$n = 0;
! while ($botcache{$_,++$n})
{
! if (($botcache{$_,$n} eq '*') ||
! ($botcache{$_,$n} eq substr($path,0,length($botcache{$_,$n}))))
{ return(0); }
}
}
--- 214,223 ----
for ($ua,'*')
{
$n = 0;
! while ($botcache{$address, $_,++$n})
{
! if (($botcache{$address, $_,$n} eq '*') ||
! ($botcache{$address, $_,$n} eq substr($path,0,length($botcache{$address, $_,$n}))))
{ return(0); }
}
}
***************
*** 246,251 ****
--- 246,252 ----
{
local($host, $port, $user_agent) = @_;
local($headers, %headers, $content, $response, $url, $n, $ua, $dis);
+ local(@user_agent, @disallow);
local($timeout) = 30;
***************
*** 273,279 ****
$n = 0;
for $dis (@disallow)
{
! $botcache{$ua,++$n} = $dis;
}
}
}
--- 274,280 ----
$n = 0;
for $dis (@disallow)
{
! $botcache{$host, $ua,++$n} = $dis;
}
}
}
***************
*** 309,315 ****
$n = 0;
for $dis (@disallow)
{
! $botcache{$ua,++$n} = $dis;
}
}
}
--- 310,316 ----
$n = 0;
for $dis (@disallow)
{
! $botcache{$host, $ua,++$n} = $dis;
}
}
}
--===_0_Thu_Sep_14_16:41:42_EDT_1995
Content-Type: text/plain; charset=us-ascii
Fred Douglis MIME accepted douglis@research.att.com
AT&T Bell Laboratories 908 582-3633 (office)
600 Mountain Ave., Rm. 2B-105 908 582-3063 (fax)
Murray Hill, NJ 07974 http://www.research.att.com/orgs/ssr/people/douglis/
--===_0_Thu_Sep_14_16:41:42_EDT_1995--