URI::URL 4.00

Gisle Aas (aas@a.sn.no)
Mon, 5 Feb 1996 23:11:15 +0100


The URI::URL module has received a major overhaul. The old module did
a lot of things wrong because it stored the individual parts of the
URL as unescaped strings. By unescaping the strings we lost
significant information.  Examples of things that did not work, but
is possible now are:

  - Reserved chars like '@', ':' and '/' in usernames and passwords

  - You can have "/" (represented as "%2f") inside path components

  - '%2e%2e/a' as a relative path is something different from '../a'

  - "&" and "=" in the values of a query encoded form

  - a single '0' as the query string

Since path(), params() and query() methods used to work with unescaped
strings I have chosen to introduce new methods when you want to work
with escaped strings; epath(), eparams() and equery().  The internal
storage is always in escaped form.

There is also a path_components() method that allows you to get/set
the path as an unescaped list.  Example follows:

  $u = new URI::URL 'file:/';
  $u->path_components("logical", "and/or");
  print $u;            # prints 'file:logical/and%2For';
  #$p = $u->path;      # is illegal because we loose information
  $p = $u->epath;      # is ok
  print $u->mac_path;  # prints ':logical:and/or'

The new file:-URLs also implement the following new methods:

  $url->local_path()    # returns a path suitable for use on the local system

which really calls one of these (guided by $Config{osname}):

  $url->unix_path()     # a path for use on a Unix system
  $url->mac_path()      # a path for use on a Macintosh system
  $url->dos_path()      # a path for use on MS-DOS
  $url->vms_path()      # a path for use on VMS (experimental)


For the http:-URLs I have implemented the following methods:

  $url->keywords()      # set/get an <isindex> query string
  $url->query_form()    # set/get a <form> query string

Example of use is:

  $url->keywords('dog', 'bones');
  $url->query_form(foo => 'bar', perl => 'cool', 'reserved' => '&=%');
  %a = $url->query_form; # loose information if the same key repeats


The following incompatibilities has been introduced and might cause
trouble for some of you:

     unsafe(), escape() and unescape()
        These methods not supported any more.

     full_path() and as_string()
        These methods does no longer take a second argument which
        specify the set of characters to consider as unsafe.

     '+' in the query-string
        The '+' character in the query part of the URL was
        earlier considered to be an encoding of a space. This was
        just bad influence from Mosaic.  Space is now encoded as
        '%20'.

     path() and query()
        This methods will croak if they loose information. Use epath()
        or equery() instead. The path() method will for instance loose
        information if any path segment contain an (encoded) '/'
        character.

     netloc()
        The string passed to netloc is now assumed to be escaped.
        The string returned will also be (partially) escaped.

     sub-classing
        The path, params and query is now stored internally in
        unescaped form.  This might affect sub-classes of the URL
        classes.

Comments?

If it turns out that the URL module now really does the right thing,
then the next step is to make the modules faster (i.e. introduce
autoloading and look at what Dprof has to say).

Regards,
Gisle