Re: URL->abs bug.

Gisle Aas (aas@bergen.sn.no)
04 Aug 1997 18:17:07 +0200


Ole Tange <tange@id.dk> writes:

> Summary: URI::URL->abs eats spaces in anchors

Demonstrated by this code:

perl -MURI::URL -le 'print url("#1. Introduction", "http://www.danedi.com/eioa.htm")->abs'

> I have been using lwp-rget for some time. Unfortunately it cannot copy any
> html-file. Specifically I have tried: http://www.danedi.com/eioa.htm which
> contains references to: <a href="#1. Introduction">. URL->abs converts
> this to: http://www.danedi.com/eioa.htm#1.Introduction (note the missing
> space) and thereby invalidates the url.

The reason URI::URL does this is that RFC1738 says:

   In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
   need to be added to break long URLs across lines.  The whitespace
   should be ignored when extracting the URL.

Whitespace should not really appear in URLs.  It should have been
encoded as '%20'.

Does anybody think we should get rid of the whitespace remover in URI::URL?

-- 
Gisle Aas <aas@sn.no>