comments in robots.txt - bug in RobotRules.pm??

Andrew Daviel (andrew@andrew.triumf.ca)
Fri, 24 Jan 1997 01:34:38 -0800 (PST)


I'm starting (finally) to look at LWP5 ....

I've got RobotRules.pm,v 1.11 1996/09/30

There's some lines
    99          s/\s*\#.*//;
   100  
   101          if (/^\s*$/) {          # blank line
   102              last if $is_me; # That was our record.

which seems to indicate that a commented-out line is converted to
a blank line, then treated as an end-of-record.

In http://info.webcrawler.com/mak/projects/robots/norobots.html#format
it says "Lines containing only a
comment are discarded completely, and therefore do not indicate a record 
boundary."

I've got robots.txt like:
User-agent: *
Disallow: /cgi-bin/
# Disallow: /cgi-pub/
Disallow: /try/

where /try/ is not parsed (i.e. the robot is hitting it). Seems to me
that this is a bug....

Also, I think in LWP4 that getting robots.txt was a "freebie" (not counted
against waiting time), but in LWP5 it isn't free. Since $ua->host_wait
returns zero for an unvisited site, this is somewhat irritating and
I've changed it (if host_wait > 0, go somewhere else first).


Andrew Daviel         
mailto:andrew@vancouver-webpages.com