Re: URL->abs bug.
Gisle Aas (aas@bergen.sn.no)
04 Aug 1997 18:17:07 +0200
Ole Tange <tange@id.dk> writes:
> Summary: URI::URL->abs eats spaces in anchors
Demonstrated by this code:
perl -MURI::URL -le 'print url("#1. Introduction", "http://www.danedi.com/eioa.htm")->abs'
> I have been using lwp-rget for some time. Unfortunately it cannot copy any
> html-file. Specifically I have tried: http://www.danedi.com/eioa.htm which
> contains references to: <a href="#1. Introduction">. URL->abs converts
> this to: http://www.danedi.com/eioa.htm#1.Introduction (note the missing
> space) and thereby invalidates the url.
The reason URI::URL does this is that RFC1738 says:
In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
need to be added to break long URLs across lines. The whitespace
should be ignored when extracting the URL.
Whitespace should not really appear in URLs. It should have been
encoded as '%20'.
Does anybody think we should get rid of the whitespace remover in URI::URL?
--
Gisle Aas <aas@sn.no>