comments in robots.txt - bug in RobotRules.pm??
Andrew Daviel (andrew@andrew.triumf.ca)
Fri, 24 Jan 1997 01:34:38 -0800 (PST)
I'm starting (finally) to look at LWP5 ....
I've got RobotRules.pm,v 1.11 1996/09/30
There's some lines
99 s/\s*\#.*//;
100
101 if (/^\s*$/) { # blank line
102 last if $is_me; # That was our record.
which seems to indicate that a commented-out line is converted to
a blank line, then treated as an end-of-record.
In http://info.webcrawler.com/mak/projects/robots/norobots.html#format
it says "Lines containing only a
comment are discarded completely, and therefore do not indicate a record
boundary."
I've got robots.txt like:
User-agent: *
Disallow: /cgi-bin/
# Disallow: /cgi-pub/
Disallow: /try/
where /try/ is not parsed (i.e. the robot is hitting it). Seems to me
that this is a bug....
Also, I think in LWP4 that getting robots.txt was a "freebie" (not counted
against waiting time), but in LWP5 it isn't free. Since $ua->host_wait
returns zero for an unvisited site, this is somewhat irritating and
I've changed it (if host_wait > 0, go somewhere else first).
Andrew Daviel
mailto:andrew@vancouver-webpages.com