Re: / and DirectoryIndex

Reinier Post (reinpost@win.tue.nl)
Wed, 21 Feb 2001 13:09:50 +0100


On Wed, Feb 21, 2001 at 04:42:20PM +0700, John Indra wrote:
> Hi all...
> 
> How do I tell my user-agent (an LWP::UserAgent object) to NOT download both
> / and index.html or whatever remote sites DirectoryIndex set to?
> Example, my user-agent sees 2 link:
> - http:://www.domain.com/

This :: notation is contagious :-)

> - http:://www.domain.com/index.html

> IF in this situation both link to the same document, my user-agent will be a
> fool if it tries to download both file. How do I make a "smarter" user-agent
> that will know that those 2 links are the same and only perform one GET
> method, either to http:://www.domain.com/ OR
> http:://www.domain.com/index.html?

The server won't tell you whether or not they're the same document.
You have the same problem with server aliases or symlinks: the whole
tree

   http://www.domain.com/a/butreally/b/*

may be identical to 

  http://www.domain.com/b/*

Depending on what you find on the server it may be possible to hypothesize
some heuristics, for instance, '*/index.html always has the same content
as */', but exceptions are always possible.  The only way to be really sure
is to check the document content, or at least the header.

-- 
Reinier