Re: Updates URL.pm file

Martijn Koster (m.koster@nexor.co.uk)
Sun, 19 Mar 1995 23:03:03 +0000


Tim,

> This is a major reworking with input from Martijn.

Like it... I have appended a few diffs that:
- quote all barewords
- include Roy's new version of the draft as test harness
- rewrote Step 6 of the &absolute algorithm in line with
  Roy's code suggestion. Seems to work.
- added a section to the pod explaining URL extension
- various cleanup: wrap long lines, added comments, single
  quote variable-less strings etc
- synchronised the netloc stuff (or is there a reason you didn't
  use this approach?

> It seems that Martijn and I overlapped in some of our changes but it
> turned out to be valuable anyway (Martijn and implemented a much better
> Absolute URL test harness than mine for example).
 
> =head1 WHAT A URL IS NOT
> 
> URL objects do not, and in my opinion (Tim Bunce) should not, know how
> to 'get' or 'put' the resources they specify locations for, anymore
> than a postal address 'knows' anything about the postal system. The
> actual access/transfer should be achieved by some form of transport
> agent class. The agent class should not be a subclass of URL.
> 
> I hope that's agreeable to the majority.

Fine by me.

> I'd really like to see the outstanding URL module issues resolved soon
> because it's a critical module. Many other modules will want to be
> passing url objects around. Even just defining/agreeing a subset of
> it's functionality and interfaces would be a big step forward.

> =head1 OUTSTANDING ISSUES
> 
> Class naming.

I vote for WWW::URL::http etc, with WWW::LWP::Request etc for the 
lwp library

> Inheritance (use and abuse).

Interesting to see $URL->new return a subclass :-)

> How to maintain consistency between netlock and user/host/port etc.

Is there anything wrong with this.

> Should URL be stored in encoded or decoded format and related issues.

That needs sorting out. Back to the spec...
 
> =head1 AUTHOR/ACKNOWLEDGMENTS

I'd just mention who worked on it, you don't want a complete change
log in the pod...


> abs_test(); exit 1;

I guess that needs resolving too?

Cheers,

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html

*** URL.pm.orig	Sun Mar 19 19:23:46 1995
--- URL.pm	Sun Mar 19 22:46:26 1995
***************
*** 3,11 ****
  package URL;
  
  $rcsid = '$Id: URL.pm,v 1.6 1995/03/17 20:10:34 timbo Exp timbo $';
! $rcsid = $rcsid;
  
! ######################################################################
  
  =head1 NAME
  
--- 3,11 ----
  package URL;
  
  $rcsid = '$Id: URL.pm,v 1.6 1995/03/17 20:10:34 timbo Exp timbo $';
! $rcsid = $rcsid; # shut up -w
  
! #####################################################################
  
  =head1 NAME
  
***************
*** 39,47 ****
--- 39,50 ----
      # Retrieving and storing common elements
  
      $scheme   = $url->scheme;
+     $user     = $url->user;
+     $password = $url->password;
      $host     = $url->host;
      $port     = $url->port;     # returns default if not defined
      $path     = $url->path;
+     $params   = $url->params;
      $query    = $url->query;
      $frag     = $url->frag;
  
***************
*** 61,86 ****
      $escaped = URL::escape($url);
      $url = URL::unescape($escaped);
  
-     
  =head1 DESCRIPTION
  
  URL objects represent URLs (RFC 1738). Both absolute and relative
  URL's are supported.
  
! URL objects are created by C<new>, which takes a string representation of
! a URL or an existing URL object reference to be cloned. Specific
! individual elements can then be accessed via the C<scheme>, C<host>,
! C<port>, C<path>, C<query> and C<frag> methods. These methods can be
! called with a value to set the element to that value, and always return
! the old value. The C<elem> method provides a general interface to
! access any element by name but it should be used with caution.
! 
! The C<abs> function attempts to return a new absolute URL object for
! a given URL.  In order to convert a relative URL into an absolute one a
! I<base> URL is required. You can associate a default base with a URL
! either by passing a C<base> to the C<new> method when a URL is created
! or using the C<base> method on the object later. Alternatively you can
! specify a once-off base as a parameter to the C<abs> method.
  
  The C<new> method must be able to determine the scheme for the URL.
  If a scheme is not specified in the URL it will use the scheme
--- 64,92 ----
      $escaped = URL::escape($url);
      $url = URL::unescape($escaped);
      
  =head1 DESCRIPTION
  
  URL objects represent URLs (RFC 1738). Both absolute and relative
  URL's are supported.
  
! URL objects are created by C<new>, which takes a string
! representation of a URL or an existing URL object reference to be
! cloned. Specific individual elements can then be accessed via the
! C<scheme>, C<user>, C<password>, C<host>, C<port>, C<path>,
!  C<params>, C<query> and C<frag> methods. These methods can be 
! called with a value to set the element to that value, and always 
! return the old value. The C<elem> method provides a general
! interface to access any element by name but it should be used with
! caution: the effect of using incorrect spelling and case is
! undefined.
! 
! The C<abs> function attempts to return a new absolute URL object
! for a given URL.  In order to convert a relative URL into an absolute
! one a I<base> URL is required. You can associate a default base with
! a URL either by passing a C<base> to the C<new> method when a URL is
! created or using the C<base> method on the object later.
! Alternatively you can specify a once-off base as a parameter to the
! C<abs> method.
  
  The C<new> method must be able to determine the scheme for the URL.
  If a scheme is not specified in the URL it will use the scheme
***************
*** 87,100 ****
  specified by the parent URL. If no parent URL scheme is defined
  then the C<new> will fail (die).
  
  
  =head1 WHAT A URL IS NOT
  
! URL objects do not, and in my opinion (Tim Bunce) should not, know how
! to 'get' or 'put' the resources they specify locations for, anymore
! than a postal address 'knows' anything about the postal system. The
! actual access/transfer should be achieved by some form of transport
! agent class. The agent class should not be a subclass of URL.
  
  
  =head1 OUTSTANDING ISSUES
--- 93,121 ----
  specified by the parent URL. If no parent URL scheme is defined
  then the C<new> will fail (die).
  
+ =head1 ADDING NEW URL's
+ 
+ New URL schemes can be added as follows. Create a C<URL> directory
+ in the the same directory as C<URL.pm> (if it's not already there).
+ In this new directory, create a C<scheme.pm> file for your scheme
+ (e.g. C<foo.pm>). In this module, declare a new subclass of
+ URL::generic like this:
+ 
+ 
+     package URL::foo;		
+     @ISA = qw(URL::generic);
+ 
+     sub default_port { 9999 };
+ 
+ and override any additional methods as required.
  
  =head1 WHAT A URL IS NOT
  
! URL objects do not, and should not, know how to 'get' or 'put' the
! resources they specify locations for, anymore than a postal address
! 'knows' anything about the postal system. The actual access/transfer
! should be achieved by some form of transport agent class. The agent
! class can use the URL class, but should not be a subclass of URL.
  
  
  =head1 OUTSTANDING ISSUES
***************
*** 101,111 ****
  
  Class naming. Inheritance (use and abuse).
  
! How to maintain consistency between netlock and user/host/port etc.
  
! Should URL be stored in encoded or decoded format and related issues.
  
- 
  =head1 AUTHOR/ACKNOWLEDGMENTS
  
  This module is based on the C<wwwurl.pl> code in the libwww-perl
--- 122,132 ----
  
  Class naming. Inheritance (use and abuse).
  
! How to maintain consistency between netloc and user, password, host
! port etc.
  
! Where and how escaping should really be implemented.
  
  =head1 AUTHOR/ACKNOWLEDGMENTS
  
  This module is based on the C<wwwurl.pl> code in the libwww-perl
***************
*** 114,128 ****
  with contributions from Brooks Cutter.
  
  Gisle Aas <aas@nr.no>, Tim Bunce <Tim.Bunce@ig.co.uk>, and Martijn
! Koster <m.koster@nexor.co.uk> have contributed to the original Perl 5
  version.
  
- Tim Bunce switched the code to blessed hash refs and internal classes
- for each scheme with optional support for new schemes to be loaded
- dynamically.  The test code was expanded and automated by both Tim
- (parsing) and Martijn Koster (abs_test). Both Martijn and Tim
- implemented relative URL support.
- 
  If you have any suggestions, bug reports, fixes, or enhancements,
  send them to the libwww-perl mailing list at
  <libwww-perl@ics.uci.edu>.
--- 135,143 ----
  with contributions from Brooks Cutter.
  
  Gisle Aas <aas@nr.no>, Tim Bunce <Tim.Bunce@ig.co.uk>, and Martijn
! Koster <m.koster@nexor.co.uk> have collaborated on the Perl 5 
  version.
  
  If you have any suggestions, bug reports, fixes, or enhancements,
  send them to the libwww-perl mailing list at
  <libwww-perl@ics.uci.edu>.
***************
*** 135,141 ****
  
  =cut
  
! ######################################################################
  
  # Perl resumes here
  
--- 150,156 ----
  
  =cut
  
! #####################################################################
  
  # Perl resumes here
  
***************
*** 151,165 ****
  
  use strict qw(subs refs);
  
! 
! 
! 
! ######################################################################
  
  # methods/functions
  
! # new()  	--  Object's constructor
  #
  sub new
  {
      my($class, $init, $base) = @_;
--- 166,179 ----
  
  use strict qw(subs refs);
  
! #####################################################################
  
  # methods/functions
  
! # new()
  #
+ # object's constructor
+ #
  sub new
  {
      my($class, $init, $base) = @_;
***************
*** 173,180 ****
  	# We need a scheme to determine which class to use
  	my($scheme) = $init =~ m/^([.+\-\w]+):/;
  	$scheme = $base->scheme if (!$scheme and $base);
! 	croak "Unable to determine scheme for '$init'" unless $scheme;
! 	$scheme = _check_scheme($scheme); # dies on failure (eg unsupportable)
  	# then hand-off to scheme specific sub-class
  	$this = "URL::$scheme"->new($init, $base);
      }
--- 187,196 ----
          # We need a scheme to determine which class to use
          my($scheme) = $init =~ m/^([.+\-\w]+):/;
          $scheme = $base->scheme if (!$scheme and $base);
!         croak "Unable to determine scheme for '$init'"
!             unless $scheme;
!         $scheme = _check_scheme($scheme); 
!             # dies on failure (eg unsupportable)
          # then hand-off to scheme specific sub-class
          $this = "URL::$scheme"->new($init, $base);
      }
***************
*** 183,188 ****
--- 199,208 ----
  }
  
  
+ # clone()
+ #
+ # copy constructor
+ #
  sub clone
  {
      my $self = shift;
***************
*** 190,206 ****
  }
  
  
! # localpath()	-- return a URL object that denotes a path on the
  #		   local filesystem (current directory by default).
  #
  sub localpath
  {
!     my($path) = @_;
      ($path = fastcwd()) =~ s:/?$:/: unless defined $path;
      new URL "file://localhost$path";
  }
  
  
  sub dump
  {
      my $self = shift;
--- 210,232 ----
  }
  
  
! # localpath()
! #
! # return a URL object that denotes a path on the
  # local filesystem (current directory by default).
  #
  sub localpath
  {
!     my($path) = shift;
      ($path = fastcwd()) =~ s:/?$:/: unless defined $path;
      new URL "file://localhost$path";
  }
  
  
+ # dump()
+ #
+ # print the contents of the object
+ #
  sub dump
  {
      my $self = shift;
***************
*** 214,230 ****
  }
  
  
! 
  sub _check_scheme
  {
      my $scheme = lc(shift); # ensure lowercase and then validate
      no strict qw(refs);
      return $scheme if exists $KnownSchemes{$scheme};	# already setup
!     # force attempt to load support for unknown scheme, die on failure
      unless (defined @{"URL::${scheme}::ISA"}){
  	require "URL/${scheme}.pm"
      } else {
! 	# silently require anyway so external files can update our code
  	eval { local($^W)=0; require "URL/${scheme}.pm" };
      }
      # setup overloading - experimental (needs cmp etc)
--- 240,261 ----
  }
  
  
! # _check_scheme()
! #
! # Checks scheme and loads its implementation
! #
  sub _check_scheme
  {
      my $scheme = lc(shift); # ensure lowercase and then validate
      no strict qw(refs);
      return $scheme if exists $KnownSchemes{$scheme}; # already setup
!     # force attempt to load support for unknown scheme
!     # die on failure
      unless (defined @{"URL::${scheme}::ISA"}){
          require "URL/${scheme}.pm"
      } else {
!         # silently require anyway so external files can update
!         # our code
          eval { local($^W)=0; require "URL/${scheme}.pm" };
      }
      # setup overloading - experimental (needs cmp etc)
***************
*** 234,240 ****
--- 265,273 ----
  }
  
  
+ #####################################################################
  
+ # Methods to handle URL's elements
  
  # These methods always return the current value,
  # so you can use $url->scheme to read the current value.
***************
*** 253,272 ****
  sub frag	{ shift->elem('frag',   @_);  }
  
  # Fields derived from generic netloc
- sub host	{ shift->elem('host',   @_);  }
- sub port	{ shift->elem('port',   @_);  }
  sub user	{ shift->elem('user',   @_);  }
  sub password	{ shift->elem('password',@_); }
  
  sub elem {
!     my($this, $element, @val) = @_;
!     my $old = $this->{$element};
!     $this->{$element} = $val[0] if @val;
      return $old;
  }
  
  
  
  # unescape()
  #
  # 'http://web/this%20has%20spaces' -> 'http://web/this has spaces'
--- 286,320 ----
  sub frag        { shift->elem('frag',   @_);  }
  
  # Fields derived from generic netloc
  sub user        { shift->elem('user',   @_);  }
  sub password    { shift->elem('password',@_); }
+ sub host        { shift->elem('host',   @_);  }
+ sub port        { shift->elem('port',   @_);  }
  
  sub elem {
!     my($self, $element, @val) = @_;
!     my $old = $self->{$element};
! 
!     if (@val) {
!         # general case
!         $self->{$element} = $val[0];
!         
!         # netloc include user, password, host, port
!         if ($element eq 'netloc') {
!             $self->_read_netloc();
!         }
!         elsif ($element =~ m/^(user)|(password)|(host)|(port)$/) {
!             $self->_write_netloc();
!         }
!     }
      return $old;
  }
  
  
+ #####################################################################
  
+ # misc functions
+ 
  # unescape()
  #
  # 'http://web/this%20has%20spaces' -> 'http://web/this has spaces'
***************
*** 275,281 ****
  {
      local($_) = @_;
      $_ = $_->str if ref($_); # unescape doubles as a method
!     s/%([\dA-Fa-f][\dA-Fa-f])/pack("C",hex($1))/eg;
      $_;
  }
  
--- 323,329 ----
  {
      local($_) = @_;
      $_ = $_->str if ref($_); # unescape doubles as a method
!     s/%([\dA-Fa-f][\dA-Fa-f])/pack('C',hex($1))/eg;
      $_;
  }
  
***************
*** 291,302 ****
  }
  
  
! ######################################################################
  #
  #	Internal pre-defined basic scheme support
  #
! # In this implementation all schemes are subclassed from URL::generic.
! # This turns out to have reasonable mileage.
  # See also draft-ietf-uri-relative-url-05.txt
  
  package URL::generic;		# base support for generic-RL's
--- 339,350 ----
  }
  
  
! #####################################################################
  #
  #       Internal pre-defined basic scheme support
  #
! # In this implementation all schemes are subclassed from
! # URL::generic. This turns out to have reasonable mileage.
  # See also draft-ietf-uri-relative-url-05.txt
  
  package URL::generic;           # base support for generic-RL's
***************
*** 306,319 ****
  
  sub new {			# inherited constructor
      my($class, $init, $base) = @_;
!     my $u = $class->_parse($init);
!     URL::dump($u) if $URL::Debug;
!     bless $u, $class;
  }
  
  
- 
- 
  # Generic-RL parser
  # See draft-ietf-uri-relative-url-05.txt Section 2
  
--- 354,365 ----
  
  sub new {                       # inherited constructor
      my($class, $init, $base) = @_;
!     my $url = $class->_parse($init);
!     URL::dump($url) if $URL::Debug;
!     bless $url, $class;
  }
  
  
  # Generic-RL parser
  # See draft-ietf-uri-relative-url-05.txt Section 2
  
***************
*** 321,333 ****
      my($self, $u) = @_;
      my %u; # Following draft-ietf-uri-relative-url-05.txt Section 2.4
  
!     $u{url}	= $u;	# keep copy of original
!     $u{frag}    = $1  if ($u =~ s/#(.*)$//);                    # 2.4.1
!     $u{scheme}=lc($1) if ($u =~ s/^\s*([\w\+\.\-]+)://);	# 2.4.2
!     $u{netloc}	= $1  if ($u =~ s!^//([^/]*)!!);		# 2.4.3
!     $u{query}	= $1  if ($u =~ s/\?(.*)//);			# 2.4.4
!     $u{params}	= $1  if ($u =~ s/;(.*)//);			# 2.4.5
!     $u{path}	= $u;						# 2.4.6
  
      # read netloc fields: "<user>:<password>@<host>:<port>"
      _read_netloc(\%u);
--- 367,379 ----
      my($self, $u) = @_;
      my %u; # Following draft-ietf-uri-relative-url-05.txt Section 2.4
  
!     $u{'url'}    = $u;  # keep copy of original
!     $u{'frag'}   =    $1  if ($u =~ s/#(.*)$//);             # 2.4.1
!     $u{'scheme'} = lc($1) if ($u =~ s/^\s*([\w\+\.\-]+)://); # 2.4.2
!     $u{'netloc'} =    $1  if ($u =~ s!^//([^/]*)!!);         # 2.4.3
!     $u{'query'}  =    $1  if ($u =~ s/\?(.*)//);             # 2.4.4
!     $u{'params'} =    $1  if ($u =~ s/;(.*)//);              # 2.4.5
!     $u{'path'}   = $u;                                       # 2.4.6
  
      # read netloc fields: "<user>:<password>@<host>:<port>"
      _read_netloc(\%u);
***************
*** 337,360 ****
  
  sub _read_netloc {	# netloc -> user, password, host, post
      my($self) = @_;
!     my $nl = $self->{netloc} || '';
!     $self->{user}     = $1 if $nl =~ s/^([^:@]*):?([^@]*)?@//;
!     $self->{password} = $2 if $2;
!     $self->{host}     = $1 if $nl =~ s/^([^:]*):?(\d*)?//;
!     $self->{port}     = $2 if $2;
  }
  
  sub _write_netloc {
      my($self) = @_;
!     my $nl = $self->{user} || '';
!     $nl .= ":$self->{password}" if $nl and $self->{password};
!     $nl .= "@$self->{host}";
!     $nl .= ":$self->{port}" if $self->{host};
!     $self->{netloc} = $nl;
  }
  
  
- 
  # Generic-RL stringify
  #
  sub str
--- 383,405 ----
  
  sub _read_netloc {      # netloc -> user, password, host, post
      my($self) = @_;
!     my $nl = $self->{'netloc'} || '';
!     $self->{'user'}     = $1 if $nl =~ s/^([^:@]*):?([^@]*)?@//;
!     $self->{'password'} = $2 if $2;
!     $self->{'host'}     = $1 if $nl =~ s/^([^:]*):?(\d*)?//;
!     $self->{'port'}     = $2 if $2;
  }
  
  sub _write_netloc {
      my($self) = @_;
!     my $nl = $self->{'user'} || '';
!     $nl .= ":$self->{'password'}" if $nl and $self->{'password'};
!     $nl .= "\@$self->{'host'}";
!     $nl .= ":$self->{'port'}" if $self->{'host'};
!     $self->{'netloc'} = $nl;
  }
  
  
  # Generic-RL stringify
  #
  sub str
***************
*** 363,369 ****
      my($scheme, $address, $port) = @{$self}{qw(scheme host port)};
      my $path  = $self->full_path;	# path+query+params+frag
      if ($address){
! 	$address .= ":$port" if ($port && $port != $self->default_port);
  	$path     = "/$path" unless $path =~ m:^/:;
  	$path     = "//$address$path";
      }
--- 408,415 ----
      my($scheme, $address, $port) = @{$self}{qw(scheme host port)};
      my $path  = $self->full_path;       # path+query+params+frag
      if ($address){
!         $address .= ":$port" if ($port && 
! 				 $port != $self->default_port);
          $path     = "/$path" unless $path =~ m:^/:;
          $path     = "//$address$path";
      }
***************
*** 393,406 ****
  #
  sub abs
  {
!     my($this, $base) = @_;
!     my $tmp;
!     my $embed = $this->clone;
  
!     $base = $this->base unless $base;		# default to default base
      return $embed unless $base;			# we have no base (step1)
  
!     $base = new URL $base unless ref $base;	# convert to obj if required
  
      my @u = @{$embed}{qw(scheme host port path params query frag) };
      my $empty = scalar(grep($_, @u)) == 0;
--- 439,451 ----
  #
  sub abs
  {
!     my($self, $base) = @_;
!     my $embed = $self->clone;
  
!     $base = $self->base unless $base;      # default to default base
      return $embed unless $base;	           # we have no base (step1)
  
!     $base = new URL $base unless ref $base;	# make obj if required
  
      my @u = @{$embed}{qw(scheme host port path params query frag) };
      my $empty = scalar(grep($_, @u)) == 0;
***************
*** 413,460 ****
      # if we have a scheme we must already be absolute	(2b)
      return $embed if $scheme;
  
!     $embed->{scheme} = $base->{scheme};		#	(2c)
  
!     return $embed if $embed->{netloc};		#	(3)
!     $embed->{netloc} = $base->{netloc};		#	(3)
!     $embed->_read_netloc;
  
!     return $embed if $embed->{path} =~ m:^/:;   #	(4)
      
!     unless ($embed->{path}){			#	(5)
! 	$embed->{path} = $base->{path};		#	(5)
  
          return $embed if $embed->params;	#	(5a)
! 	$embed->{params} = $base->{params};	#	(5a)
  
          return $embed if $embed->query;		#	(5b)
! 	$embed->{query} = $base->{query};	#	(5b)
          return $embed;
      }
  
!     my $path = $base->{path};
!     my $ep  = $embed->{path};
!     $ep =~ s!^/!!;              # XXX should this be?
!     if ($path !~ s!(.*)/.*$!$1/$ep!) {
!         $path = $ep;
!     }
!     $path =~ s!^\./!/!g; # a
!     $path =~ s!/\./!/!g;
!     $path =~ s!^\.$!/!; # b
!     $path =~ s!/\.$!/!;
!     while($path =~ s![^/]+/\.\./!!) { # c
!         ;
      }  
!     $path =~ s![^/]+/\.\.$!!; # d
!     $embed->{path} = $path;
  
      return $embed;
  }
  
  
! 
  sub default_port {
!     undef
  }
  
  
--- 458,538 ----
      # if we have a scheme we must already be absolute	(2b)
      return $embed if $scheme;
  
!     $embed->{'scheme'} = $base->{'scheme'};      # (2c)
  
!     return $embed if $embed->{'netloc'};	 # (3)
!     $embed->{'netloc'} = $base->{'netloc'};	 # (3)
!     $embed->_read_netloc();
  
!     return $embed if $embed->{'path'} =~ m:^/:;	 # (4)
      
!     unless ($embed->{'path'}){			 # (5)
! 	$embed->{'path'} = $base->{'path'};	 # (5)
  
          return $embed if $embed->params;	 # (5a)
! 	$embed->{'params'} = $base->{'params'};	 # (5a)
  
          return $embed if $embed->query;		 # (5b)
! 	$embed->{'query'} = $base->{'query'};	 # (5b)
          return $embed;
      }
  
!     # (6)
! 
!     my $basepath = $base->{'path'};
!     my $relpath  = $embed->{'path'};
! 
!     # draft 6 suggests stack based approach
! 
!     $basepath =~ s!^/!!;
!     $basepath =~ s!/$!/.!;              # prevent empty segment
!     my @path = split('/', $basepath);   # base path into segments
!     pop(@path);			        # remove last segment
! 
!     $relpath =~ s!/$!/.!;               # prevent empty segment
! 
!     push(@path, split('/', $relpath));  # append relative segments
! 
!     my @newpath = ();
!     my $isdir = 0;
!     foreach $segment (@path) {	# left to right
! #	warn '> ', join('/', @newpath), ": $segment\n";
!         if ($segment eq '.') {	# ignore "same" directory
! 	    $isdir = 1;
! 	}
!         elsif ($segment eq '..') {
! 	    $isdir = 1;
! 	    my $last = pop(@newpath);
! 	    if (!defined $last) { # nothing to pop
! 		push(@newpath, $segment); # so must append
! 	    }
! 	    elsif ($last eq '..') { # '..' cannot match '..'
! 		# so put pack again, and append
! 		push(@newpath, $last, $segment);
! 	    }
! 	    else {
! 		# it was a component, 
! 		# keep popped
! 	    }
  	}
! 	else {
! 	    $isdir = 0;
! 	    push(@newpath, $segment);
!         }
!     }
  
+     $embed->{'path'} = join('/', @newpath) . ($isdir ? '/' : '');
+     
      return $embed;
  }
  
  
! # default_port()
! #
! # subclasses will usually want to override this
! #
  sub default_port {
!     undef;
  }
  
  
***************
*** 501,508 ****
  sub _parse {
      my($self, $url) = @_;
      my %u;
!     $u{scheme}  = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
!     $u{netloc}  = $1 if $url =~ s!^//([^/]*)!!;
      $u{'gtype'} = $1 if $url =~ s!^/(.)!!;
      @u{qw(selector search string)} = split(/%09/, $url, 3);
      \%u;
--- 579,586 ----
  sub _parse {
      my($self, $url) = @_;
      my %u;
!     $u{'scheme'}  = lc($1) if $url =~ s/^\s*([\w\+\.\-]+)://;
!     $u{'netloc'}  =    $1  if $url =~ s!^//([^/]*)!!;
      $u{'gtype'}   =    $1  if $url =~ s!^/(.)!!;
      @u{qw(selector search string)} = split(/%09/, $url, 3);
      \%u;
***************
*** 514,527 ****
  
  sub default_port { 79 };
  
- 
- 
  package URL::http;		@ISA = qw(URL::generic);
  
  sub default_port { 80 };
  
- 
- 
  package URL::nntp;		@ISA = qw(URL::generic);
  
  sub default_port { 119 };
--- 592,601 ----
***************
*** 529,535 ****
  sub _parse {
      my($self, $init) = @_;
      my $u = $self->URL::generic::_parse($init);
!     ($u->{group}, $u->{digits}) = (split('/', $u->{path}))[1,2];
      $u;
  }
  
--- 603,610 ----
  sub _parse {
      my($self, $init) = @_;
      my $u = $self->URL::generic::_parse($init);
!     ($u->{'group'}, $u->{'digits'}) = 
! 	(split('/', $u->{'path'}))[1,2];
      $u;
  }
  
***************
*** 540,547 ****
  sub _parse {
      my($self, $url) = @_;
      my %u;
!     $u{scheme}  = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
!     $u{grouppart} = $url;
      $u{ ($url =~ m/\@/) ? 'article' : 'group' } = $url;
      \%u;
  }
--- 615,622 ----
  sub _parse {
      my($self, $url) = @_;
      my %u;
!     $u{'scheme'}  = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
!     $u{'grouppart'} = $url;
      $u{ ($url =~ m/\@/) ? 'article' : 'group' } = $url;
      \%u;
  }
***************
*** 556,562 ****
      my($self, $url) = @_;
      my $u = $self->URL::generic::_parse($url);
      @{$u}{qw(database wtype wpath)}
! 	    = (split(/\//, $u->{path}))[1,2,3];
      $u;
  }
  
--- 631,637 ----
      my($self, $url) = @_;
      my $u = $self->URL::generic::_parse($url);
      @{$u}{qw(database wtype wpath)}
! 	    = (split(/\//, $u->{'path'}))[1,2,3];
      $u;
  }
  
***************
*** 579,586 ****
  sub _parse {
      my($self, $url) = @_;
      my %u;
!     $u{scheme}  = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
!     $u{encoded822addr} = $url;
      \%u;
  }
  
--- 654,661 ----
  sub _parse {
      my($self, $url) = @_;
      my %u;
!     $u{'scheme'}  = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
!     $u{'encoded822addr'} = $url;
      \%u;
  }
  
***************
*** 592,624 ****
  
  package URL::tn3270;	@ISA = qw(URL::generic);
  
- package URL::g;		@ISA = qw(URL::generic); # just for the tests
- 
- 
  
  
! ######################################################################
  #
  # If we're not use'd or require'd execute self-test.
  # Handy for regression testing and as a quick reference :)
! 
  
  eval join('',<main::DATA>) || die $@ unless caller();
  
  1;
  __END__
  
  package main;
  
  use Carp;
  import URL qw(escape unescape);
  
  # Dies if an error has been detected.
  # prints "ok" otherwise.
  
  $| = 1;
  $txt = '';
! $err = "URL self test failed";
  $errors = 0;
  
  print "URL self test...\n";
--- 667,716 ----
  
  package URL::tn3270;	@ISA = qw(URL::generic);
  
  
  
! #####################################################################
  #
  # If we're not use'd or require'd execute self-test.
  # Handy for regression testing and as a quick reference :)
! #
! # Test is kept behind __END__ so it doesn't take uptime
! # and memory  unless explicitly required
  
  eval join('',<main::DATA>) || die $@ unless caller();
  
  1;
+ 
+ package main;			# the __END__ is in main
  __END__
  
+ package URL::g;		@ISA = qw(URL::generic); # just for the tests
+ 
  package main;
  
  use Carp;
  import URL qw(escape unescape);
  
+ &test;
+ 
  # Dies if an error has been detected.
  # prints "ok" otherwise.
  
+ sub test {
      $| = 1;
+ 
+     &netloc_test;
+ 
+     &abs_test;
+ }
+ 
+ 
+ exit 1;
+ 
+ # XXX These tests aren't called. Need cleaning up
+ 
  $txt = '';
! $err = 'URL self test failed';
  $errors = 0;
  
  print "URL self test...\n";
***************
*** 628,675 ****
      ++$errors;
  }
  
- abs_test(); exit 1;
- 
  # test retrieval methods
  
  $tests = {
      'hTTp://web1.net/a/b/c/welcome#intro'
! 	=> {	scheme => 'http', host=>'web1.net', port=>undef,
! 		path=>'/a/b/c/welcome', frag=>'intro', query=>undef },
  
      'http://web:1/a?query+text'
! 	=> {	scheme => 'http', host=>'web', port=>1,
! 		path=>'/a', frag=>undef, query=>'query+text' },
  
      'http://web.net'
! 	=> {	scheme => 'http', host=>'web.net', port=>undef,
! 		path=>'', frag=>undef, query=>undef, user=>undef },
  
      'ftp://usr:pswd@web:1234/a/b;type=i'
! 	=> {	host=>'web', port=>1234, path=>'/a/b',
! 		user=>'usr', password=>'pswd', params=>'type=i' },
  
      'gopher://web/2a_selector'
! 	=> {	gtype=>'2', selector=>'a_selector' },
  
      'mailto:libwww-perl@ics.uci.edu'
! 	=> {	encoded822addr=>'libwww-perl@ics.uci.edu' },
  
!     'news:*'		  => {	grouppart=>'*' },
!     'news:comp.lang.perl' => {	group=>'comp.lang.perl' },
      'news:perl-faq/module-list-1-794455075@ig.co.uk'
! 	=> {	article=>'perl-faq/module-list-1-794455075@ig.co.uk' },
  
      'nntp://news.com/comp.lang.perl/42'
! 	=> {	group=>'comp.lang.perl', digits=>42 },
  
      'telnet://usr:pswd@web:12345/'
! 	=> {	user=>'usr', password=>'pswd' },
  
!     'wais://web.net/db'       => { database=>'db' },
!     'wais://web.net/db?query' => { database=>'db', query=>'query' },
      'wais://usr:pswd@web.net/db/wt/wp'
! 	=> {	database=>'db', wtype=>'wt', wpath=>'wp', password=>'pswd' },
  };
  
  foreach $url_str (sort keys %$tests ){
--- 720,770 ----
      ++$errors;
  }
  
  # test retrieval methods
  
  $tests = {
      'hTTp://web1.net/a/b/c/welcome#intro'
! 	=> {	'scheme'=>'http', 'host'=>'web1.net', 'port'=>undef,
! 		'path'=>'/a/b/c/welcome', 'frag'=>'intro', 'query'=>undef },
  
      'http://web:1/a?query+text'
! 	=> {	'scheme'=>'http', 'host'=>'web', 'port'=>1,
! 		'path'=>'/a', 'frag'=>undef, 'query'=>'query+text' },
  
      'http://web.net'
! 	=> {	'scheme'=>'http', 'host'=>'web.net', 'port'=>undef,
! 		'path'=>'', 'frag'=>undef, 'query'=>undef, 'user'=>undef },
  
      'ftp://usr:pswd@web:1234/a/b;type=i'
! 	=> {	'host'=>'web', 'port'=>1234, 'path'=>'/a/b',
! 		'user'=>'usr', 'password'=>'pswd', 'params'=>'type=i' },
  
      'gopher://web/2a_selector'
! 	=> {	'gtype'=>'2', 'selector'=>'a_selector' },
  
      'mailto:libwww-perl@ics.uci.edu'
! 	=> {	'encoded822addr'=>'libwww-perl@ics.uci.edu' },
  
!     'news:*'		 
! 	=>{	'grouppart'=>'*' },
!     'news:comp.lang.perl' 
! 	=> {	'group'=>'comp.lang.perl' },
      'news:perl-faq/module-list-1-794455075@ig.co.uk'
! 	=> {	'article'=>'perl-faq/module-list-1-794455075@ig.co.uk' },
  
      'nntp://news.com/comp.lang.perl/42'
! 	=> {	'group'=>'comp.lang.perl', 'digits'=>42 },
  
      'telnet://usr:pswd@web:12345/'
! 	=> {	'user'=>'usr', 'password'=>'pswd' },
  
!     'wais://web.net/db'       
! 	=> { 'database'=>'db' },
!     'wais://web.net/db?query' 
! 	=> { 'database'=>'db', 'query'=>'query' },
      'wais://usr:pswd@web.net/db/wt/wp'
! 	=> {	'database'=>'db', 'wtype'=>'wt', 'wpath'=>'wp',
! 		    'password'=>'pswd' },
  };
  
  foreach $url_str (sort keys %$tests ){
***************
*** 704,717 ****
  	unless $txt eq 'gopher://gop:71/1info?key+word#this';
  
  $url->query(undef);
! fail "set-to-undef failed" if defined($url->query);
  
  
  # Test base url and relative->absolute conversions
  
! $base = new URL "http://base.com:25";
! $relu = new URL "hello.gif", $base;
! 
  
  
  # escaping functions
--- 799,812 ----
  	unless $txt eq 'gopher://gop:71/1info?key+word#this';
  
  $url->query(undef);
! fail 'set-to-undef failed' if defined($url->query);
  
  
  # Test base url and relative->absolute conversions
  
! $base = new URL 'http://base.com:25';
! $relu = new URL 'hello.gif', $base;
! $relu = $relu;
  
  
  # escaping functions
***************
*** 723,730 ****
  $txt = unescape($url);
  fail $err unless $txt eq 'http://web/this has spaces';
   
! $txt = unescape(new URL 'http://web/this%20has%20space');
! fail $err, $txt unless $txt eq 'http://web/this has space';
  
  
  # test localpath
--- 818,825 ----
  $txt = unescape($url);
  fail $err unless $txt eq 'http://web/this has spaces';
   
! $txt = unescape(new URL 'http://web/this%20has%20spaces');
! fail $err, $txt unless $txt eq 'http://web/this has spaces';
  
  
  # test localpath
***************
*** 742,750 ****
  print "URL self test ok ($URL::rcsid)\n";
  
  
  sub abs_test {
  
!     # Tests from draft-ietf-uri-relative-url-05.txt
  
      my $base = 'http://a/b/c/d;p?q#f';
  
--- 837,862 ----
  print "URL self test ok ($URL::rcsid)\n";
  
  
+ # netloc_test()
+ #
+ # Test netloc synchronisation
+ #
+ sub netloc_test {
+     my $url = URL->new('http://anonymous:me@host:90');
+     die "netloc user" unless $url->user eq 'anonymous';
+     $url->user('nemo');
+     die "netloc_test 1" unless $url->user eq 'nemo';
+     die "netloc_test 2" unless $url->netloc eq 'nemo:me@host:90';
+ }
+ 
+ 
+ # abs_test()
+ #
+ # Test relative/absolute URL parsing
+ #
  sub abs_test {
  
!     # Tests from draft-ietf-uri-relative-url-06.txt
  
      my $base = 'http://a/b/c/d;p?q#f';
  
***************
*** 752,757 ****
--- 864,870 ----
  
      $absolute_tests = <<EOM;
  5.1.  Normal Examples
+ 
        g:h        = <URL:g:h>
        g          = <URL:http://a/b/c/g>
        ./g        = <URL:http://a/b/c/g>
***************
*** 779,786 ****
--- 892,915 ----
  
  5.2.  Abnormal Examples
  
+    Although the following abnormal examples are unlikely to occur
+    in normal practice, all URL parsers should be capable of resolving
+    them consistently.  Each example uses the same base as above.
+ 
+    An empty reference resolves to the complete base URL:
+ 
        <>            = <URL:http://a/b/c/d;p?q#f>
+ 
+    Parsers must be careful in handling the case where there are more
+    relative path ".." segments than there are hierarchical levels in
+    the base URL's path.  Note that the ".." syntax cannot be used to
+    change the <net_loc> of a URL.
+ 
        ../../../g    = <URL:http://a/../g>
+       ../../../../g = <URL:http://a/../../g>
+ 
+    Similarly, parsers must avoid treating "." and ".." as special when
+    they are not complete components of a relative path.
  
        /./g          = <URL:http://a/./g>
        /../g         = <URL:http://a/../g>
***************
*** 789,799 ****
--- 918,936 ----
        g..           = <URL:http://a/b/c/g..>
        ..g           = <URL:http://a/b/c/..g>
  
+    Less likely are cases where the relative URL uses unnecessary or
+    nonsensical forms of the "." and ".." complete path segments.
+ 
        ./../g        = <URL:http://a/b/g>
        ./g/.         = <URL:http://a/b/c/g/>
        g/./h         = <URL:http://a/b/c/g/h>
        g/../h        = <URL:http://a/b/c/h>
  
+    Finally, some older parsers allow the scheme name to be present in
+    a relative URL if it is the same as the base URL scheme.  This is
+    considered to be a loophole in prior specifications of partial
+    URLs [1] and should be avoided by future parsers.
+ 
        http:g        = <URL:http:g>
        http:         = <URL:http:>
  EOM
***************
*** 801,807 ****
      # @absolute_tests = ( ['g:h' => 'g:h'], ...)
  
      for $line (split("\n", $absolute_tests)) {
! 	next unless $line =~ /^\s/;
  	if ($line =~ /^\s+(\S+)\s*=\s*<URL:([^>]*)>/) {
  	    my($rel, $abs) = ($1, $2);
  	    $rel = '' if $rel eq '<>';
--- 938,944 ----
      # @absolute_tests = ( ['g:h' => 'g:h'], ...)
  
      for $line (split("\n", $absolute_tests)) {
! 	next unless $line =~ /^\s{6}/;
  	if ($line =~ /^\s+(\S+)\s*=\s*<URL:([^>]*)>/) {
  	    my($rel, $abs) = ($1, $2);
  	    $rel = '' if $rel eq '<>';
***************
*** 812,817 ****
--- 949,960 ----
  	}
      }
  
+     # add some extra ones for good measure
+ 
+     push(@absolute_tests, ['x/y//../z', => 'http://a/b/c/x/y/z']);
+ 
+     print "  Relative    +  Base  =>  Expected Absolute URL\n";
+     print "================================================\n";
      for $test (@absolute_tests) {
  	my $rel = $test->[0];
  	my $abs = $test->[1];
***************
*** 819,826 ****
  	my $u   = URL->new($rel, $base);
  	my $got = $u->abs;
  	my $abs = new URL $abs;
! 	if ("$got" ne "$abs"){
! 	    warn "'$rel' '$base' -> '$got'  expected '$abs'\n\n";
  	    $u->dump;
  	    $got->dump;
  	    die "--- failed.\n";
--- 962,969 ----
  	my $u   = URL->new($rel, $base);
  	my $got = $u->abs;
  	my $abs = new URL $abs;
! 	if ($got->str ne $abs->str){
! 	    warn "  Failed! Got '$got' instead\n\n";
  	    $u->dump;
  	    $got->dump;
  	    die "--- failed.\n";
***************
*** 854,856 ****
--- 997,1000 ----
  
  1;
  exit 0;
+