Re: web mirroring
Martijn Koster (m.koster@nexor.co.uk)
Fri, 02 Dec 1994 08:26:10 +0000
> Why don't you run a cache at the slow end? This is exactly the kind of
> thing caches were developed to do.
Quite.
> >Is mirroring of this type ethical?
>
> As long as you do it just to improve performance, why not?
Hmm, that's open to debate. For example Oscar's script doesn't use
If-modified-since or even HEAD last time I looked, so someone
mirroring hundreds of documens daily means pulling them all accross
every time, which adds an unecesarry load to the server, which IMHO is
unethical.
> >Should the mirroring software use the robot exclusion standard?
>
> Yes, and it probably won't.
:-(
> >Should I notify the webmasters and/or document maintainers?
>
> Not necessary, they'll contact you when they see the logs.
Again this depends on the expected load. If it is considerable its
nice to preempt problems.
> Depends on how badly the robot will behave. Will you be running it
> every night? Will it transfer very large numbers of documents?
> Will it insist on always having the latest version? Etc. It would
> be polite to let them know.
Indeed.
> >Should I wait for permission before proceeding?
>
> No.
A reasoneable time?
> >Does free software already exist to perform http mirroring?
>
> Not specifically, and it doesn't seem to be worthwhile, either. Robots
> + caches are superior to mirror programs, except for very large sets of
> documents.
I still think they'd be useful on occasion.
> >If so, where can it be obtained?
> >If not, how could I apply libwww and libwww-compatible scripts to the task?
> >Is this likely to be beyond the ability of a relative Perl novice?
>
> No, but you certainly won't need to write your own robot.
>
> >[Note: the answers to many of these seem obvious; I'm checking just in
> >case common-sense doesn't pan out. Put another way, I'm erring on the
> >side of completeness at risk of looking like a complete idiot :)]
>
> Have you tried the robots mailing list? It's at nexor.ac.uk, see their
> HTTP server.
make that http://web.nexor.co.uk/mak/doc/robots/robots.html
Cheers,
-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html