Re: Robots.txt caching in RobotRules and RobotUA (libwww-perl5)
Gisle Aas (aas@bergen.sn.no)
Fri, 28 Jun 1996 12:19:15 +0200
In message <Pine.LNX.3.91.960628101512.29800B-100000@gungner.ub2.lu.se>, Sigfr
id Lundberg writes:
>
> I should like to give a background to the problem raised by my friend
> Hkan. We have a harvesting robot, which we recently have moved from
> a hacked version of libwww-perl4 to libwww-perl5. In doing that we
> have realized that neither the RobotRules, nor the RobotUA classes are
> flexible enough for our purposes. The major problem is the caching of
> robot rules and other per host information (like time stamp for last
> access).
>
> This information (and some other information, like previous status
> codes and an estimate of sizes of servers) are for optimum scheduling
> of robot activities.
>
> What we would like to do is to generalize the code such that the
> information can be stored away in an arbitrary way, but the current
> method of doing it should still be the default. It should be possible
> to run a simple robot in exactly the same way as currently, but
> it should also be possible to inherit the code in a more advance robot
> class, that (say) stored the information in a sql database for the
> benefit of a bunch of "WebAnts".
>
> Basically, we would like to rewrite the current modules, such that
> the code does not depend on the way this data is stored. We are
> willing to contribute the changes to the library, such that our
> own RobotUA may be able to inherit future versions of the library.
I would be happy to include your rewrite in libwww-perl. Especially
if it really backwards compatible with the current modules (and if I
like your coding style :-)
Let's see your code...
Regards,
Gisle.