Re: URI::Heuristic
Bruce A. Fraser (b.a.fraser@larc.nasa.gov)
Tue, 14 Oct 1997 10:15:58 -0400
I think this module is a good idea and what you have is well done. I
have a couple of thoughts I'd like to pass on before I get caught up in
something else and forget. I admit I haven't put a great deal of
thought into this, but if I don't put it down now, I might never get
around to it. :-)
Considering the impending changes to the current DNS heirarchy planned
by the Internet Ad Hoc Committee, I believe that it would be worthwhile
to make the prefix and especially the postfix guesses changeable on the
fly rather than hard coded in. Of course, what you have already
included below would make good defaults for now. Using the order of the
arguments in the guessing algorithm would also allow programmers to
rearrange the order of DNS lookups. For example, someone out there has
laid claim to the domain name nasa.com, but I certainly wouldn't want to
accidentally connect to one of their servers instead of nasa.gov.
Some examples of its possible uses:
use URI::Heuristic qw(friendly_url url_prefix url_postfix);
# Some sites use unusual conventions
@defaults = url_prefix();
url_prefix("web", "public", @defaults);
# Rearrange the order of guesses
url_postfix("gov", "mil", "edu", "com", "net");
# Specify some subdomains to search first (and include the defaults)
@defaults = url_postfix();
url_postfix("larc.nasa.gov", "nasa.gov", "gov", @defaults);
# Add in a few new top level domains
@defaults = url_postfix();
url_postfix(@defaults, "biz", "rec", "store");
Of course I don't have to time at present to work on this myself, so I
won't complain if you choose not to implement these ideas, but I offer
them up as food for thought. Feel free to use, modify or ignore these
suggestions as you see fit.
Bruce
Gisle Aas wrote:
>
> I wanted a module that encapsulates heuristics similar to what
> Netscape does when you write some string into the "Location:" field.
> This is my first try. Comments on module and function naming are
> welcomed.
>
> Regards,
> Gisle
>
> #--------------------------------------------------
> package URI::Heuristic;
>
> # $Id: Heuristic.pm,v 4.1 1997/10/13 13:04:36 aas Exp $
>
> =head1 NAME
>
> friendly_url - Expand URL using heuristics
>
> =head1 SYNOPSIS
>
> use URI::Heuristic qw(friendly_url);
> $url = friendly_url("perl"); # http://www.perl.com
> $url = friendly_url("www.sol.no/sol"); # http://www.sol.no/no
> $url = friendly_url("aas"); # http://www.aas.no
> $url = friendly_url("ftp.funet.fi"); # ftp://ftp.funet.fi
> $url = friendly_url("/etc/passwd"); # file:/etc/passwd
>
> =head1 DESCRIPTION
>
> This module provide functions that expand strings into real URLs using
> some heuristics. The following functions are provided:
>
> =over 4
>
> =item friendly_url($str)
>
> The friendly_url() function will try to make the string passed as
> argument into a proper absolute URL string.
>
> =item url($str)
>
> This functions work the same way as friendly_url() but it will
> return a C<URI::URL> object.
>
> =back
>
> =head1 COPYRIGHT
>
> Copyright 1997, Gisle Aas
>
> This library is free software; you can redistribute it and/or
> modify it under the same terms as Perl itself.
>
> =cut
>
> use strict;
>
> use vars qw(@EXPORT_OK);
>
> require Exporter;
> *import = \&Exporter::import;
> @EXPORT_OK = qw(url friendly_url);
>
> my $my_country;
> eval {
> require Net::Domain;
> my $fqdn = Net::Domain::hostfqdn();
> $my_country = lc($1) if $fqdn =~ /\.([a-zA-Z]{2})$/;
> };
>
> sub url ($)
> {
> require URI::URL;
> URI::URL->new(friendly_url($_[0]));
> }
>
> sub friendly_url ($)
> {
> local($_) = @_;
> return unless defined;
>
> s/^\s+//;
> s/\s+$//;
>
> if (/^(www|web|http)\./) {
> $_ = "http://$_";
>
> } elsif (/^(ftp|gopher|news|wais)\./) {
> $_ = "$1://$_";
>
> } elsif (m,^/, || # absolute file name
> m,^\.\.?/, || # relative file name
> m,^[a-zA-Z]:[/\\],) # dosish file name
> {
> $_ = "file:$_";
>
> } elsif (!/^[.+\-\w]+:/) { # no scheme specified
> if (s/^([\w\.]+)//) {
> my $host = $1;
>
> if ($host !~ /\./) {
> my @guess;
> push(@guess, "www.$host.$my_country") if $my_country;
> push(@guess, map { "www.$host.$_" } "com", "org");
> push(@guess, map { "www.$host.$_"} "gov", "mil")
> if $my_country && $my_country eq "us";
>
> my $guess;
> for $guess (@guess) {
> if (gethostbyname($guess)) {
> $host = $guess;
> last;
> }
> }
> }
> $_ = "http://$host$_";
> }
>
> }
> $_;
> }
>
> 1;
--
-------------------------------------------------------------------
| Bruce A. Fraser | B.A.Fraser@LaRC.NASA.GOV |
| MS 157B Building 1268 Room 2092 | Systems Administrator |
| Integrated Computing Environment| Computer Sciences Corporation |
| NASA Langley Research Center | Phone: 1 804 864-1246 |
| Hampton, VA 23681-0001 | Fax : 1 804 864-8342 |
-------------------------------------------------------------------