Re: Updates URL.pm file
Martijn Koster (m.koster@nexor.co.uk)
Sun, 19 Mar 1995 23:03:03 +0000
Tim,
> This is a major reworking with input from Martijn.
Like it... I have appended a few diffs that:
- quote all barewords
- include Roy's new version of the draft as test harness
- rewrote Step 6 of the &absolute algorithm in line with
Roy's code suggestion. Seems to work.
- added a section to the pod explaining URL extension
- various cleanup: wrap long lines, added comments, single
quote variable-less strings etc
- synchronised the netloc stuff (or is there a reason you didn't
use this approach?
> It seems that Martijn and I overlapped in some of our changes but it
> turned out to be valuable anyway (Martijn and implemented a much better
> Absolute URL test harness than mine for example).
> =head1 WHAT A URL IS NOT
>
> URL objects do not, and in my opinion (Tim Bunce) should not, know how
> to 'get' or 'put' the resources they specify locations for, anymore
> than a postal address 'knows' anything about the postal system. The
> actual access/transfer should be achieved by some form of transport
> agent class. The agent class should not be a subclass of URL.
>
> I hope that's agreeable to the majority.
Fine by me.
> I'd really like to see the outstanding URL module issues resolved soon
> because it's a critical module. Many other modules will want to be
> passing url objects around. Even just defining/agreeing a subset of
> it's functionality and interfaces would be a big step forward.
> =head1 OUTSTANDING ISSUES
>
> Class naming.
I vote for WWW::URL::http etc, with WWW::LWP::Request etc for the
lwp library
> Inheritance (use and abuse).
Interesting to see $URL->new return a subclass :-)
> How to maintain consistency between netlock and user/host/port etc.
Is there anything wrong with this.
> Should URL be stored in encoded or decoded format and related issues.
That needs sorting out. Back to the spec...
> =head1 AUTHOR/ACKNOWLEDGMENTS
I'd just mention who worked on it, you don't want a complete change
log in the pod...
> abs_test(); exit 1;
I guess that needs resolving too?
Cheers,
-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html
*** URL.pm.orig Sun Mar 19 19:23:46 1995
--- URL.pm Sun Mar 19 22:46:26 1995
***************
*** 3,11 ****
package URL;
$rcsid = '$Id: URL.pm,v 1.6 1995/03/17 20:10:34 timbo Exp timbo $';
! $rcsid = $rcsid;
! ######################################################################
=head1 NAME
--- 3,11 ----
package URL;
$rcsid = '$Id: URL.pm,v 1.6 1995/03/17 20:10:34 timbo Exp timbo $';
! $rcsid = $rcsid; # shut up -w
! #####################################################################
=head1 NAME
***************
*** 39,47 ****
--- 39,50 ----
# Retrieving and storing common elements
$scheme = $url->scheme;
+ $user = $url->user;
+ $password = $url->password;
$host = $url->host;
$port = $url->port; # returns default if not defined
$path = $url->path;
+ $params = $url->params;
$query = $url->query;
$frag = $url->frag;
***************
*** 61,86 ****
$escaped = URL::escape($url);
$url = URL::unescape($escaped);
-
=head1 DESCRIPTION
URL objects represent URLs (RFC 1738). Both absolute and relative
URL's are supported.
! URL objects are created by C<new>, which takes a string representation of
! a URL or an existing URL object reference to be cloned. Specific
! individual elements can then be accessed via the C<scheme>, C<host>,
! C<port>, C<path>, C<query> and C<frag> methods. These methods can be
! called with a value to set the element to that value, and always return
! the old value. The C<elem> method provides a general interface to
! access any element by name but it should be used with caution.
!
! The C<abs> function attempts to return a new absolute URL object for
! a given URL. In order to convert a relative URL into an absolute one a
! I<base> URL is required. You can associate a default base with a URL
! either by passing a C<base> to the C<new> method when a URL is created
! or using the C<base> method on the object later. Alternatively you can
! specify a once-off base as a parameter to the C<abs> method.
The C<new> method must be able to determine the scheme for the URL.
If a scheme is not specified in the URL it will use the scheme
--- 64,92 ----
$escaped = URL::escape($url);
$url = URL::unescape($escaped);
=head1 DESCRIPTION
URL objects represent URLs (RFC 1738). Both absolute and relative
URL's are supported.
! URL objects are created by C<new>, which takes a string
! representation of a URL or an existing URL object reference to be
! cloned. Specific individual elements can then be accessed via the
! C<scheme>, C<user>, C<password>, C<host>, C<port>, C<path>,
! C<params>, C<query> and C<frag> methods. These methods can be
! called with a value to set the element to that value, and always
! return the old value. The C<elem> method provides a general
! interface to access any element by name but it should be used with
! caution: the effect of using incorrect spelling and case is
! undefined.
!
! The C<abs> function attempts to return a new absolute URL object
! for a given URL. In order to convert a relative URL into an absolute
! one a I<base> URL is required. You can associate a default base with
! a URL either by passing a C<base> to the C<new> method when a URL is
! created or using the C<base> method on the object later.
! Alternatively you can specify a once-off base as a parameter to the
! C<abs> method.
The C<new> method must be able to determine the scheme for the URL.
If a scheme is not specified in the URL it will use the scheme
***************
*** 87,100 ****
specified by the parent URL. If no parent URL scheme is defined
then the C<new> will fail (die).
=head1 WHAT A URL IS NOT
! URL objects do not, and in my opinion (Tim Bunce) should not, know how
! to 'get' or 'put' the resources they specify locations for, anymore
! than a postal address 'knows' anything about the postal system. The
! actual access/transfer should be achieved by some form of transport
! agent class. The agent class should not be a subclass of URL.
=head1 OUTSTANDING ISSUES
--- 93,121 ----
specified by the parent URL. If no parent URL scheme is defined
then the C<new> will fail (die).
+ =head1 ADDING NEW URL's
+
+ New URL schemes can be added as follows. Create a C<URL> directory
+ in the the same directory as C<URL.pm> (if it's not already there).
+ In this new directory, create a C<scheme.pm> file for your scheme
+ (e.g. C<foo.pm>). In this module, declare a new subclass of
+ URL::generic like this:
+
+
+ package URL::foo;
+ @ISA = qw(URL::generic);
+
+ sub default_port { 9999 };
+
+ and override any additional methods as required.
=head1 WHAT A URL IS NOT
! URL objects do not, and should not, know how to 'get' or 'put' the
! resources they specify locations for, anymore than a postal address
! 'knows' anything about the postal system. The actual access/transfer
! should be achieved by some form of transport agent class. The agent
! class can use the URL class, but should not be a subclass of URL.
=head1 OUTSTANDING ISSUES
***************
*** 101,111 ****
Class naming. Inheritance (use and abuse).
! How to maintain consistency between netlock and user/host/port etc.
! Should URL be stored in encoded or decoded format and related issues.
-
=head1 AUTHOR/ACKNOWLEDGMENTS
This module is based on the C<wwwurl.pl> code in the libwww-perl
--- 122,132 ----
Class naming. Inheritance (use and abuse).
! How to maintain consistency between netloc and user, password, host
! port etc.
! Where and how escaping should really be implemented.
=head1 AUTHOR/ACKNOWLEDGMENTS
This module is based on the C<wwwurl.pl> code in the libwww-perl
***************
*** 114,128 ****
with contributions from Brooks Cutter.
Gisle Aas <aas@nr.no>, Tim Bunce <Tim.Bunce@ig.co.uk>, and Martijn
! Koster <m.koster@nexor.co.uk> have contributed to the original Perl 5
version.
- Tim Bunce switched the code to blessed hash refs and internal classes
- for each scheme with optional support for new schemes to be loaded
- dynamically. The test code was expanded and automated by both Tim
- (parsing) and Martijn Koster (abs_test). Both Martijn and Tim
- implemented relative URL support.
-
If you have any suggestions, bug reports, fixes, or enhancements,
send them to the libwww-perl mailing list at
<libwww-perl@ics.uci.edu>.
--- 135,143 ----
with contributions from Brooks Cutter.
Gisle Aas <aas@nr.no>, Tim Bunce <Tim.Bunce@ig.co.uk>, and Martijn
! Koster <m.koster@nexor.co.uk> have collaborated on the Perl 5
version.
If you have any suggestions, bug reports, fixes, or enhancements,
send them to the libwww-perl mailing list at
<libwww-perl@ics.uci.edu>.
***************
*** 135,141 ****
=cut
! ######################################################################
# Perl resumes here
--- 150,156 ----
=cut
! #####################################################################
# Perl resumes here
***************
*** 151,165 ****
use strict qw(subs refs);
!
!
!
! ######################################################################
# methods/functions
! # new() -- Object's constructor
#
sub new
{
my($class, $init, $base) = @_;
--- 166,179 ----
use strict qw(subs refs);
! #####################################################################
# methods/functions
! # new()
#
+ # object's constructor
+ #
sub new
{
my($class, $init, $base) = @_;
***************
*** 173,180 ****
# We need a scheme to determine which class to use
my($scheme) = $init =~ m/^([.+\-\w]+):/;
$scheme = $base->scheme if (!$scheme and $base);
! croak "Unable to determine scheme for '$init'" unless $scheme;
! $scheme = _check_scheme($scheme); # dies on failure (eg unsupportable)
# then hand-off to scheme specific sub-class
$this = "URL::$scheme"->new($init, $base);
}
--- 187,196 ----
# We need a scheme to determine which class to use
my($scheme) = $init =~ m/^([.+\-\w]+):/;
$scheme = $base->scheme if (!$scheme and $base);
! croak "Unable to determine scheme for '$init'"
! unless $scheme;
! $scheme = _check_scheme($scheme);
! # dies on failure (eg unsupportable)
# then hand-off to scheme specific sub-class
$this = "URL::$scheme"->new($init, $base);
}
***************
*** 183,188 ****
--- 199,208 ----
}
+ # clone()
+ #
+ # copy constructor
+ #
sub clone
{
my $self = shift;
***************
*** 190,206 ****
}
! # localpath() -- return a URL object that denotes a path on the
# local filesystem (current directory by default).
#
sub localpath
{
! my($path) = @_;
($path = fastcwd()) =~ s:/?$:/: unless defined $path;
new URL "file://localhost$path";
}
sub dump
{
my $self = shift;
--- 210,232 ----
}
! # localpath()
! #
! # return a URL object that denotes a path on the
# local filesystem (current directory by default).
#
sub localpath
{
! my($path) = shift;
($path = fastcwd()) =~ s:/?$:/: unless defined $path;
new URL "file://localhost$path";
}
+ # dump()
+ #
+ # print the contents of the object
+ #
sub dump
{
my $self = shift;
***************
*** 214,230 ****
}
!
sub _check_scheme
{
my $scheme = lc(shift); # ensure lowercase and then validate
no strict qw(refs);
return $scheme if exists $KnownSchemes{$scheme}; # already setup
! # force attempt to load support for unknown scheme, die on failure
unless (defined @{"URL::${scheme}::ISA"}){
require "URL/${scheme}.pm"
} else {
! # silently require anyway so external files can update our code
eval { local($^W)=0; require "URL/${scheme}.pm" };
}
# setup overloading - experimental (needs cmp etc)
--- 240,261 ----
}
! # _check_scheme()
! #
! # Checks scheme and loads its implementation
! #
sub _check_scheme
{
my $scheme = lc(shift); # ensure lowercase and then validate
no strict qw(refs);
return $scheme if exists $KnownSchemes{$scheme}; # already setup
! # force attempt to load support for unknown scheme
! # die on failure
unless (defined @{"URL::${scheme}::ISA"}){
require "URL/${scheme}.pm"
} else {
! # silently require anyway so external files can update
! # our code
eval { local($^W)=0; require "URL/${scheme}.pm" };
}
# setup overloading - experimental (needs cmp etc)
***************
*** 234,240 ****
--- 265,273 ----
}
+ #####################################################################
+ # Methods to handle URL's elements
# These methods always return the current value,
# so you can use $url->scheme to read the current value.
***************
*** 253,272 ****
sub frag { shift->elem('frag', @_); }
# Fields derived from generic netloc
- sub host { shift->elem('host', @_); }
- sub port { shift->elem('port', @_); }
sub user { shift->elem('user', @_); }
sub password { shift->elem('password',@_); }
sub elem {
! my($this, $element, @val) = @_;
! my $old = $this->{$element};
! $this->{$element} = $val[0] if @val;
return $old;
}
# unescape()
#
# 'http://web/this%20has%20spaces' -> 'http://web/this has spaces'
--- 286,320 ----
sub frag { shift->elem('frag', @_); }
# Fields derived from generic netloc
sub user { shift->elem('user', @_); }
sub password { shift->elem('password',@_); }
+ sub host { shift->elem('host', @_); }
+ sub port { shift->elem('port', @_); }
sub elem {
! my($self, $element, @val) = @_;
! my $old = $self->{$element};
!
! if (@val) {
! # general case
! $self->{$element} = $val[0];
!
! # netloc include user, password, host, port
! if ($element eq 'netloc') {
! $self->_read_netloc();
! }
! elsif ($element =~ m/^(user)|(password)|(host)|(port)$/) {
! $self->_write_netloc();
! }
! }
return $old;
}
+ #####################################################################
+ # misc functions
+
# unescape()
#
# 'http://web/this%20has%20spaces' -> 'http://web/this has spaces'
***************
*** 275,281 ****
{
local($_) = @_;
$_ = $_->str if ref($_); # unescape doubles as a method
! s/%([\dA-Fa-f][\dA-Fa-f])/pack("C",hex($1))/eg;
$_;
}
--- 323,329 ----
{
local($_) = @_;
$_ = $_->str if ref($_); # unescape doubles as a method
! s/%([\dA-Fa-f][\dA-Fa-f])/pack('C',hex($1))/eg;
$_;
}
***************
*** 291,302 ****
}
! ######################################################################
#
# Internal pre-defined basic scheme support
#
! # In this implementation all schemes are subclassed from URL::generic.
! # This turns out to have reasonable mileage.
# See also draft-ietf-uri-relative-url-05.txt
package URL::generic; # base support for generic-RL's
--- 339,350 ----
}
! #####################################################################
#
# Internal pre-defined basic scheme support
#
! # In this implementation all schemes are subclassed from
! # URL::generic. This turns out to have reasonable mileage.
# See also draft-ietf-uri-relative-url-05.txt
package URL::generic; # base support for generic-RL's
***************
*** 306,319 ****
sub new { # inherited constructor
my($class, $init, $base) = @_;
! my $u = $class->_parse($init);
! URL::dump($u) if $URL::Debug;
! bless $u, $class;
}
-
-
# Generic-RL parser
# See draft-ietf-uri-relative-url-05.txt Section 2
--- 354,365 ----
sub new { # inherited constructor
my($class, $init, $base) = @_;
! my $url = $class->_parse($init);
! URL::dump($url) if $URL::Debug;
! bless $url, $class;
}
# Generic-RL parser
# See draft-ietf-uri-relative-url-05.txt Section 2
***************
*** 321,333 ****
my($self, $u) = @_;
my %u; # Following draft-ietf-uri-relative-url-05.txt Section 2.4
! $u{url} = $u; # keep copy of original
! $u{frag} = $1 if ($u =~ s/#(.*)$//); # 2.4.1
! $u{scheme}=lc($1) if ($u =~ s/^\s*([\w\+\.\-]+)://); # 2.4.2
! $u{netloc} = $1 if ($u =~ s!^//([^/]*)!!); # 2.4.3
! $u{query} = $1 if ($u =~ s/\?(.*)//); # 2.4.4
! $u{params} = $1 if ($u =~ s/;(.*)//); # 2.4.5
! $u{path} = $u; # 2.4.6
# read netloc fields: "<user>:<password>@<host>:<port>"
_read_netloc(\%u);
--- 367,379 ----
my($self, $u) = @_;
my %u; # Following draft-ietf-uri-relative-url-05.txt Section 2.4
! $u{'url'} = $u; # keep copy of original
! $u{'frag'} = $1 if ($u =~ s/#(.*)$//); # 2.4.1
! $u{'scheme'} = lc($1) if ($u =~ s/^\s*([\w\+\.\-]+)://); # 2.4.2
! $u{'netloc'} = $1 if ($u =~ s!^//([^/]*)!!); # 2.4.3
! $u{'query'} = $1 if ($u =~ s/\?(.*)//); # 2.4.4
! $u{'params'} = $1 if ($u =~ s/;(.*)//); # 2.4.5
! $u{'path'} = $u; # 2.4.6
# read netloc fields: "<user>:<password>@<host>:<port>"
_read_netloc(\%u);
***************
*** 337,360 ****
sub _read_netloc { # netloc -> user, password, host, post
my($self) = @_;
! my $nl = $self->{netloc} || '';
! $self->{user} = $1 if $nl =~ s/^([^:@]*):?([^@]*)?@//;
! $self->{password} = $2 if $2;
! $self->{host} = $1 if $nl =~ s/^([^:]*):?(\d*)?//;
! $self->{port} = $2 if $2;
}
sub _write_netloc {
my($self) = @_;
! my $nl = $self->{user} || '';
! $nl .= ":$self->{password}" if $nl and $self->{password};
! $nl .= "@$self->{host}";
! $nl .= ":$self->{port}" if $self->{host};
! $self->{netloc} = $nl;
}
-
# Generic-RL stringify
#
sub str
--- 383,405 ----
sub _read_netloc { # netloc -> user, password, host, post
my($self) = @_;
! my $nl = $self->{'netloc'} || '';
! $self->{'user'} = $1 if $nl =~ s/^([^:@]*):?([^@]*)?@//;
! $self->{'password'} = $2 if $2;
! $self->{'host'} = $1 if $nl =~ s/^([^:]*):?(\d*)?//;
! $self->{'port'} = $2 if $2;
}
sub _write_netloc {
my($self) = @_;
! my $nl = $self->{'user'} || '';
! $nl .= ":$self->{'password'}" if $nl and $self->{'password'};
! $nl .= "\@$self->{'host'}";
! $nl .= ":$self->{'port'}" if $self->{'host'};
! $self->{'netloc'} = $nl;
}
# Generic-RL stringify
#
sub str
***************
*** 363,369 ****
my($scheme, $address, $port) = @{$self}{qw(scheme host port)};
my $path = $self->full_path; # path+query+params+frag
if ($address){
! $address .= ":$port" if ($port && $port != $self->default_port);
$path = "/$path" unless $path =~ m:^/:;
$path = "//$address$path";
}
--- 408,415 ----
my($scheme, $address, $port) = @{$self}{qw(scheme host port)};
my $path = $self->full_path; # path+query+params+frag
if ($address){
! $address .= ":$port" if ($port &&
! $port != $self->default_port);
$path = "/$path" unless $path =~ m:^/:;
$path = "//$address$path";
}
***************
*** 393,406 ****
#
sub abs
{
! my($this, $base) = @_;
! my $tmp;
! my $embed = $this->clone;
! $base = $this->base unless $base; # default to default base
return $embed unless $base; # we have no base (step1)
! $base = new URL $base unless ref $base; # convert to obj if required
my @u = @{$embed}{qw(scheme host port path params query frag) };
my $empty = scalar(grep($_, @u)) == 0;
--- 439,451 ----
#
sub abs
{
! my($self, $base) = @_;
! my $embed = $self->clone;
! $base = $self->base unless $base; # default to default base
return $embed unless $base; # we have no base (step1)
! $base = new URL $base unless ref $base; # make obj if required
my @u = @{$embed}{qw(scheme host port path params query frag) };
my $empty = scalar(grep($_, @u)) == 0;
***************
*** 413,460 ****
# if we have a scheme we must already be absolute (2b)
return $embed if $scheme;
! $embed->{scheme} = $base->{scheme}; # (2c)
! return $embed if $embed->{netloc}; # (3)
! $embed->{netloc} = $base->{netloc}; # (3)
! $embed->_read_netloc;
! return $embed if $embed->{path} =~ m:^/:; # (4)
! unless ($embed->{path}){ # (5)
! $embed->{path} = $base->{path}; # (5)
return $embed if $embed->params; # (5a)
! $embed->{params} = $base->{params}; # (5a)
return $embed if $embed->query; # (5b)
! $embed->{query} = $base->{query}; # (5b)
return $embed;
}
! my $path = $base->{path};
! my $ep = $embed->{path};
! $ep =~ s!^/!!; # XXX should this be?
! if ($path !~ s!(.*)/.*$!$1/$ep!) {
! $path = $ep;
! }
! $path =~ s!^\./!/!g; # a
! $path =~ s!/\./!/!g;
! $path =~ s!^\.$!/!; # b
! $path =~ s!/\.$!/!;
! while($path =~ s![^/]+/\.\./!!) { # c
! ;
}
! $path =~ s![^/]+/\.\.$!!; # d
! $embed->{path} = $path;
return $embed;
}
!
sub default_port {
! undef
}
--- 458,538 ----
# if we have a scheme we must already be absolute (2b)
return $embed if $scheme;
! $embed->{'scheme'} = $base->{'scheme'}; # (2c)
! return $embed if $embed->{'netloc'}; # (3)
! $embed->{'netloc'} = $base->{'netloc'}; # (3)
! $embed->_read_netloc();
! return $embed if $embed->{'path'} =~ m:^/:; # (4)
! unless ($embed->{'path'}){ # (5)
! $embed->{'path'} = $base->{'path'}; # (5)
return $embed if $embed->params; # (5a)
! $embed->{'params'} = $base->{'params'}; # (5a)
return $embed if $embed->query; # (5b)
! $embed->{'query'} = $base->{'query'}; # (5b)
return $embed;
}
! # (6)
!
! my $basepath = $base->{'path'};
! my $relpath = $embed->{'path'};
!
! # draft 6 suggests stack based approach
!
! $basepath =~ s!^/!!;
! $basepath =~ s!/$!/.!; # prevent empty segment
! my @path = split('/', $basepath); # base path into segments
! pop(@path); # remove last segment
!
! $relpath =~ s!/$!/.!; # prevent empty segment
!
! push(@path, split('/', $relpath)); # append relative segments
!
! my @newpath = ();
! my $isdir = 0;
! foreach $segment (@path) { # left to right
! # warn '> ', join('/', @newpath), ": $segment\n";
! if ($segment eq '.') { # ignore "same" directory
! $isdir = 1;
! }
! elsif ($segment eq '..') {
! $isdir = 1;
! my $last = pop(@newpath);
! if (!defined $last) { # nothing to pop
! push(@newpath, $segment); # so must append
! }
! elsif ($last eq '..') { # '..' cannot match '..'
! # so put pack again, and append
! push(@newpath, $last, $segment);
! }
! else {
! # it was a component,
! # keep popped
! }
}
! else {
! $isdir = 0;
! push(@newpath, $segment);
! }
! }
+ $embed->{'path'} = join('/', @newpath) . ($isdir ? '/' : '');
+
return $embed;
}
! # default_port()
! #
! # subclasses will usually want to override this
! #
sub default_port {
! undef;
}
***************
*** 501,508 ****
sub _parse {
my($self, $url) = @_;
my %u;
! $u{scheme} = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
! $u{netloc} = $1 if $url =~ s!^//([^/]*)!!;
$u{'gtype'} = $1 if $url =~ s!^/(.)!!;
@u{qw(selector search string)} = split(/%09/, $url, 3);
\%u;
--- 579,586 ----
sub _parse {
my($self, $url) = @_;
my %u;
! $u{'scheme'} = lc($1) if $url =~ s/^\s*([\w\+\.\-]+)://;
! $u{'netloc'} = $1 if $url =~ s!^//([^/]*)!!;
$u{'gtype'} = $1 if $url =~ s!^/(.)!!;
@u{qw(selector search string)} = split(/%09/, $url, 3);
\%u;
***************
*** 514,527 ****
sub default_port { 79 };
-
-
package URL::http; @ISA = qw(URL::generic);
sub default_port { 80 };
-
-
package URL::nntp; @ISA = qw(URL::generic);
sub default_port { 119 };
--- 592,601 ----
***************
*** 529,535 ****
sub _parse {
my($self, $init) = @_;
my $u = $self->URL::generic::_parse($init);
! ($u->{group}, $u->{digits}) = (split('/', $u->{path}))[1,2];
$u;
}
--- 603,610 ----
sub _parse {
my($self, $init) = @_;
my $u = $self->URL::generic::_parse($init);
! ($u->{'group'}, $u->{'digits'}) =
! (split('/', $u->{'path'}))[1,2];
$u;
}
***************
*** 540,547 ****
sub _parse {
my($self, $url) = @_;
my %u;
! $u{scheme} = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
! $u{grouppart} = $url;
$u{ ($url =~ m/\@/) ? 'article' : 'group' } = $url;
\%u;
}
--- 615,622 ----
sub _parse {
my($self, $url) = @_;
my %u;
! $u{'scheme'} = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
! $u{'grouppart'} = $url;
$u{ ($url =~ m/\@/) ? 'article' : 'group' } = $url;
\%u;
}
***************
*** 556,562 ****
my($self, $url) = @_;
my $u = $self->URL::generic::_parse($url);
@{$u}{qw(database wtype wpath)}
! = (split(/\//, $u->{path}))[1,2,3];
$u;
}
--- 631,637 ----
my($self, $url) = @_;
my $u = $self->URL::generic::_parse($url);
@{$u}{qw(database wtype wpath)}
! = (split(/\//, $u->{'path'}))[1,2,3];
$u;
}
***************
*** 579,586 ****
sub _parse {
my($self, $url) = @_;
my %u;
! $u{scheme} = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
! $u{encoded822addr} = $url;
\%u;
}
--- 654,661 ----
sub _parse {
my($self, $url) = @_;
my %u;
! $u{'scheme'} = lc($1) if ($url =~ s/^\s*([\w\+\.\-]+)://);
! $u{'encoded822addr'} = $url;
\%u;
}
***************
*** 592,624 ****
package URL::tn3270; @ISA = qw(URL::generic);
- package URL::g; @ISA = qw(URL::generic); # just for the tests
-
-
! ######################################################################
#
# If we're not use'd or require'd execute self-test.
# Handy for regression testing and as a quick reference :)
!
eval join('',<main::DATA>) || die $@ unless caller();
1;
__END__
package main;
use Carp;
import URL qw(escape unescape);
# Dies if an error has been detected.
# prints "ok" otherwise.
$| = 1;
$txt = '';
! $err = "URL self test failed";
$errors = 0;
print "URL self test...\n";
--- 667,716 ----
package URL::tn3270; @ISA = qw(URL::generic);
! #####################################################################
#
# If we're not use'd or require'd execute self-test.
# Handy for regression testing and as a quick reference :)
! #
! # Test is kept behind __END__ so it doesn't take uptime
! # and memory unless explicitly required
eval join('',<main::DATA>) || die $@ unless caller();
1;
+
+ package main; # the __END__ is in main
__END__
+ package URL::g; @ISA = qw(URL::generic); # just for the tests
+
package main;
use Carp;
import URL qw(escape unescape);
+ &test;
+
# Dies if an error has been detected.
# prints "ok" otherwise.
+ sub test {
$| = 1;
+
+ &netloc_test;
+
+ &abs_test;
+ }
+
+
+ exit 1;
+
+ # XXX These tests aren't called. Need cleaning up
+
$txt = '';
! $err = 'URL self test failed';
$errors = 0;
print "URL self test...\n";
***************
*** 628,675 ****
++$errors;
}
- abs_test(); exit 1;
-
# test retrieval methods
$tests = {
'hTTp://web1.net/a/b/c/welcome#intro'
! => { scheme => 'http', host=>'web1.net', port=>undef,
! path=>'/a/b/c/welcome', frag=>'intro', query=>undef },
'http://web:1/a?query+text'
! => { scheme => 'http', host=>'web', port=>1,
! path=>'/a', frag=>undef, query=>'query+text' },
'http://web.net'
! => { scheme => 'http', host=>'web.net', port=>undef,
! path=>'', frag=>undef, query=>undef, user=>undef },
'ftp://usr:pswd@web:1234/a/b;type=i'
! => { host=>'web', port=>1234, path=>'/a/b',
! user=>'usr', password=>'pswd', params=>'type=i' },
'gopher://web/2a_selector'
! => { gtype=>'2', selector=>'a_selector' },
'mailto:libwww-perl@ics.uci.edu'
! => { encoded822addr=>'libwww-perl@ics.uci.edu' },
! 'news:*' => { grouppart=>'*' },
! 'news:comp.lang.perl' => { group=>'comp.lang.perl' },
'news:perl-faq/module-list-1-794455075@ig.co.uk'
! => { article=>'perl-faq/module-list-1-794455075@ig.co.uk' },
'nntp://news.com/comp.lang.perl/42'
! => { group=>'comp.lang.perl', digits=>42 },
'telnet://usr:pswd@web:12345/'
! => { user=>'usr', password=>'pswd' },
! 'wais://web.net/db' => { database=>'db' },
! 'wais://web.net/db?query' => { database=>'db', query=>'query' },
'wais://usr:pswd@web.net/db/wt/wp'
! => { database=>'db', wtype=>'wt', wpath=>'wp', password=>'pswd' },
};
foreach $url_str (sort keys %$tests ){
--- 720,770 ----
++$errors;
}
# test retrieval methods
$tests = {
'hTTp://web1.net/a/b/c/welcome#intro'
! => { 'scheme'=>'http', 'host'=>'web1.net', 'port'=>undef,
! 'path'=>'/a/b/c/welcome', 'frag'=>'intro', 'query'=>undef },
'http://web:1/a?query+text'
! => { 'scheme'=>'http', 'host'=>'web', 'port'=>1,
! 'path'=>'/a', 'frag'=>undef, 'query'=>'query+text' },
'http://web.net'
! => { 'scheme'=>'http', 'host'=>'web.net', 'port'=>undef,
! 'path'=>'', 'frag'=>undef, 'query'=>undef, 'user'=>undef },
'ftp://usr:pswd@web:1234/a/b;type=i'
! => { 'host'=>'web', 'port'=>1234, 'path'=>'/a/b',
! 'user'=>'usr', 'password'=>'pswd', 'params'=>'type=i' },
'gopher://web/2a_selector'
! => { 'gtype'=>'2', 'selector'=>'a_selector' },
'mailto:libwww-perl@ics.uci.edu'
! => { 'encoded822addr'=>'libwww-perl@ics.uci.edu' },
! 'news:*'
! =>{ 'grouppart'=>'*' },
! 'news:comp.lang.perl'
! => { 'group'=>'comp.lang.perl' },
'news:perl-faq/module-list-1-794455075@ig.co.uk'
! => { 'article'=>'perl-faq/module-list-1-794455075@ig.co.uk' },
'nntp://news.com/comp.lang.perl/42'
! => { 'group'=>'comp.lang.perl', 'digits'=>42 },
'telnet://usr:pswd@web:12345/'
! => { 'user'=>'usr', 'password'=>'pswd' },
! 'wais://web.net/db'
! => { 'database'=>'db' },
! 'wais://web.net/db?query'
! => { 'database'=>'db', 'query'=>'query' },
'wais://usr:pswd@web.net/db/wt/wp'
! => { 'database'=>'db', 'wtype'=>'wt', 'wpath'=>'wp',
! 'password'=>'pswd' },
};
foreach $url_str (sort keys %$tests ){
***************
*** 704,717 ****
unless $txt eq 'gopher://gop:71/1info?key+word#this';
$url->query(undef);
! fail "set-to-undef failed" if defined($url->query);
# Test base url and relative->absolute conversions
! $base = new URL "http://base.com:25";
! $relu = new URL "hello.gif", $base;
!
# escaping functions
--- 799,812 ----
unless $txt eq 'gopher://gop:71/1info?key+word#this';
$url->query(undef);
! fail 'set-to-undef failed' if defined($url->query);
# Test base url and relative->absolute conversions
! $base = new URL 'http://base.com:25';
! $relu = new URL 'hello.gif', $base;
! $relu = $relu;
# escaping functions
***************
*** 723,730 ****
$txt = unescape($url);
fail $err unless $txt eq 'http://web/this has spaces';
! $txt = unescape(new URL 'http://web/this%20has%20space');
! fail $err, $txt unless $txt eq 'http://web/this has space';
# test localpath
--- 818,825 ----
$txt = unescape($url);
fail $err unless $txt eq 'http://web/this has spaces';
! $txt = unescape(new URL 'http://web/this%20has%20spaces');
! fail $err, $txt unless $txt eq 'http://web/this has spaces';
# test localpath
***************
*** 742,750 ****
print "URL self test ok ($URL::rcsid)\n";
sub abs_test {
! # Tests from draft-ietf-uri-relative-url-05.txt
my $base = 'http://a/b/c/d;p?q#f';
--- 837,862 ----
print "URL self test ok ($URL::rcsid)\n";
+ # netloc_test()
+ #
+ # Test netloc synchronisation
+ #
+ sub netloc_test {
+ my $url = URL->new('http://anonymous:me@host:90');
+ die "netloc user" unless $url->user eq 'anonymous';
+ $url->user('nemo');
+ die "netloc_test 1" unless $url->user eq 'nemo';
+ die "netloc_test 2" unless $url->netloc eq 'nemo:me@host:90';
+ }
+
+
+ # abs_test()
+ #
+ # Test relative/absolute URL parsing
+ #
sub abs_test {
! # Tests from draft-ietf-uri-relative-url-06.txt
my $base = 'http://a/b/c/d;p?q#f';
***************
*** 752,757 ****
--- 864,870 ----
$absolute_tests = <<EOM;
5.1. Normal Examples
+
g:h = <URL:g:h>
g = <URL:http://a/b/c/g>
./g = <URL:http://a/b/c/g>
***************
*** 779,786 ****
--- 892,915 ----
5.2. Abnormal Examples
+ Although the following abnormal examples are unlikely to occur
+ in normal practice, all URL parsers should be capable of resolving
+ them consistently. Each example uses the same base as above.
+
+ An empty reference resolves to the complete base URL:
+
<> = <URL:http://a/b/c/d;p?q#f>
+
+ Parsers must be careful in handling the case where there are more
+ relative path ".." segments than there are hierarchical levels in
+ the base URL's path. Note that the ".." syntax cannot be used to
+ change the <net_loc> of a URL.
+
../../../g = <URL:http://a/../g>
+ ../../../../g = <URL:http://a/../../g>
+
+ Similarly, parsers must avoid treating "." and ".." as special when
+ they are not complete components of a relative path.
/./g = <URL:http://a/./g>
/../g = <URL:http://a/../g>
***************
*** 789,799 ****
--- 918,936 ----
g.. = <URL:http://a/b/c/g..>
..g = <URL:http://a/b/c/..g>
+ Less likely are cases where the relative URL uses unnecessary or
+ nonsensical forms of the "." and ".." complete path segments.
+
./../g = <URL:http://a/b/g>
./g/. = <URL:http://a/b/c/g/>
g/./h = <URL:http://a/b/c/g/h>
g/../h = <URL:http://a/b/c/h>
+ Finally, some older parsers allow the scheme name to be present in
+ a relative URL if it is the same as the base URL scheme. This is
+ considered to be a loophole in prior specifications of partial
+ URLs [1] and should be avoided by future parsers.
+
http:g = <URL:http:g>
http: = <URL:http:>
EOM
***************
*** 801,807 ****
# @absolute_tests = ( ['g:h' => 'g:h'], ...)
for $line (split("\n", $absolute_tests)) {
! next unless $line =~ /^\s/;
if ($line =~ /^\s+(\S+)\s*=\s*<URL:([^>]*)>/) {
my($rel, $abs) = ($1, $2);
$rel = '' if $rel eq '<>';
--- 938,944 ----
# @absolute_tests = ( ['g:h' => 'g:h'], ...)
for $line (split("\n", $absolute_tests)) {
! next unless $line =~ /^\s{6}/;
if ($line =~ /^\s+(\S+)\s*=\s*<URL:([^>]*)>/) {
my($rel, $abs) = ($1, $2);
$rel = '' if $rel eq '<>';
***************
*** 812,817 ****
--- 949,960 ----
}
}
+ # add some extra ones for good measure
+
+ push(@absolute_tests, ['x/y//../z', => 'http://a/b/c/x/y/z']);
+
+ print " Relative + Base => Expected Absolute URL\n";
+ print "================================================\n";
for $test (@absolute_tests) {
my $rel = $test->[0];
my $abs = $test->[1];
***************
*** 819,826 ****
my $u = URL->new($rel, $base);
my $got = $u->abs;
my $abs = new URL $abs;
! if ("$got" ne "$abs"){
! warn "'$rel' '$base' -> '$got' expected '$abs'\n\n";
$u->dump;
$got->dump;
die "--- failed.\n";
--- 962,969 ----
my $u = URL->new($rel, $base);
my $got = $u->abs;
my $abs = new URL $abs;
! if ($got->str ne $abs->str){
! warn " Failed! Got '$got' instead\n\n";
$u->dump;
$got->dump;
die "--- failed.\n";
***************
*** 854,856 ****
--- 997,1000 ----
1;
exit 0;
+