wwwbot bug - I don't get it

Fred Douglis (douglis@research.att.com)
Thu, 14 Sep 1995 16:42:30 -0400


This is a multipart MIME message.

--===_0_Thu_Sep_14_16:41:42_EDT_1995
Content-Type: text/plain; charset=us-ascii

I am using libwww-perl v0.40 and a modified version of Brooks's w3new
program.  I found that hosts were inexplicably being flagged as
disallowing robots and eventually tracked this down, it seems, to a
problem where a single host that disallows robots causing future
checks on *other* hosts to fail.

I say "it seems" because I can't believe this is really the case --
it's too substantial a bug to slip through the cracks all this time,
if anyone is using wwwbot.pl at all.  But then, I can't explain it any
other way.

So, I changed it to cache the hostname as well as the agent and
disallowed URLs.  A patch follows.


--===_0_Thu_Sep_14_16:41:42_EDT_1995
Content-Type: application/x-patch
Content-Description: wwwbot.patch

*** wwwbot.pl	Thu Sep 14 13:04:53 1995
--- lib/perl/libwww-perl-0.40/wwwbot.pl	Thu Sep 14 13:15:08 1995
***************
*** 214,223 ****
      for ($ua,'*')
      {
          $n = 0;
!         while ($botcache{$_,++$n})
          {
!             if (($botcache{$_,$n} eq '*') ||
!                 ($botcache{$_,$n} eq substr($path,0,length($botcache{$_,$n}))))
                  { return(0); }
          }
      }
--- 214,223 ----
      for ($ua,'*')
      {
          $n = 0;
!         while ($botcache{$address, $_,++$n})
          {
!             if (($botcache{$address, $_,$n} eq '*') ||
!                 ($botcache{$address, $_,$n} eq substr($path,0,length($botcache{$address, $_,$n}))))
                  { return(0); }
          }
      }
***************
*** 246,251 ****
--- 246,252 ----
  {
      local($host, $port, $user_agent) = @_;
      local($headers, %headers, $content, $response, $url, $n, $ua, $dis);
+     local(@user_agent, @disallow);
  
      local($timeout) = 30;
  
***************
*** 273,279 ****
                      $n = 0;
                      for $dis (@disallow)
                      {
!                         $botcache{$ua,++$n} = $dis;
                      }
                  }
              }
--- 274,280 ----
                      $n = 0;
                      for $dis (@disallow)
                      {
!                         $botcache{$host, $ua,++$n} = $dis;
                      }
                  }
              }
***************
*** 309,315 ****
              $n = 0;
              for $dis (@disallow)
              {
!                 $botcache{$ua,++$n} = $dis;
              }
          }
      }
--- 310,316 ----
              $n = 0;
              for $dis (@disallow)
              {
!                 $botcache{$host, $ua,++$n} = $dis;
              }
          }
      }

--===_0_Thu_Sep_14_16:41:42_EDT_1995
Content-Type: text/plain; charset=us-ascii


Fred Douglis 		    MIME accepted	douglis@research.att.com
AT&T Bell Laboratories				908 582-3633 (office)
600 Mountain Ave., Rm. 2B-105 			908 582-3063 (fax)
Murray Hill, NJ 07974     http://www.research.att.com/orgs/ssr/people/douglis/

--===_0_Thu_Sep_14_16:41:42_EDT_1995--