Re: URI::URL->abs bug? (libwww-Perl5)

Gisle Aas (aas@bergen.sn.no)
04 Aug 1997 18:00:44 +0200


Boris Statnikov <boris@blaze.cs.jhu.edu> writes:

> As Mr. Scwartz pointed out, this error is not, strictly speaking, due to
> libwww.  Once again, my problem would sometimes occur with a relative path
> such as
> ../../../index.html for a base url of
> http://someserver.com/~user/directoryname/
> which is transformed into 
> http://someserver.com/../index.html 
> by invoking abs() method on URL.

RFC1808 says:

   Parsers must be careful in handling the case where there are more
   relative path ".." segments than there are hierarchical levels in the
   base URL's path.  Note that the ".." syntax cannot be used to change
   the <net_loc> of a URL.

      ../../../g    = <URL:http://a/../g>
      ../../../../g = <URL:http://a/../../g>

> 1. To change abs() method of URI::URL to return
> http://someserver.com/index.html 
> (making it consistent with the server).

There is more than one server out there.

> 2. Same as above, except in an extra method, keeping abs() as it is now.

Might be a good idea.

> 3. To use some other scheme for determining which URLs have been visited -
> such as CRC-32 - on a document's link contents or in some other fashion.
> This is not bulletproof, but as good as, and seems like a good solution to
> me.  

I would recommend that you do this anyhow.  If you use MD5 as checksum
it should be almost bulletproof.

-- 
Gisle Aas <aas@sn.no>