URI::URL->abs bug? (libwww-Perl5)

Boris Statnikov (boris@hops.cs.jhu.edu)
Mon, 14 Jul 1997 15:51:11 -0400 (EDT)


I have encountered the following problem with URI::URL.

Suppose I do something like this:

$fullUrl = new URI::URL('./robot2.html',
'http://www.ugrad.cs.jhu.edu/~boris/robot.html')->abs;

print $fullUrl;

I get 'http://www.ugrad.cs.jhu/~boris/robot2.html', which is correct.

However, this will break:

$fullUlr = new URI::URL('../../../',
'http://www.ugrad.cs.jhu.edu/~boris')->abs;
print $fullUrl;

producing 'http://www.ugrad.cs.jhu.edu/../../'

Since that request will produce the root page (which looks different from
the same page seen previously - by name only!), a while later a robot
might try accessing 'http://www.ugrad.cs.jhu.edu/../../../../' - see below
for a specific example of this infinite loop.

Of course, such link points sort of nowhere (above the root).
However, there are some pages out there (as I found out painfully) which
have just this kind of broken links which can be resolved: to
'http://www.ugrad.cs.jhu.edu/' in this example or to the root in general.
I would appreciate any advice on how to avoid this problem without
hacking.  Should I, for example, subclass from URI::URL? 

P.S. the implication of this 'broken link' is this: if your robot hashes
on full urls to avoid loops, it can enter an infinite (if you don't
restrict search depth) loop, or a very deep one (if you do) - check out,
for example, 'http://hopkins.med.jhu.edu/top.html/' as vs.
'http://hopkins.med.jhu.edu/../top.html/' - repeat ad infinitum :)

Thanks for your help. 

Boris

Too many cooks spoil the brouhaha.
	Harvard Lampoon