and we can all talk about them.
*** ../libwww-perl-0.20/README.html Wed Jul 20 20:55:17 1994
--- README.html Mon Aug 1 06:40:51 1994
***************
*** 1,5 ****
!
libwww-perl: Distribution Information
--- 1,5 ----
!
libwww-perl: Distribution Information
***************
*** 39,50 ****
libwww-perl-request@ics.uci.edu
Support for the initial development and distribution of libwww-perl
has been provided by the
Arcadia Project at UCI, part of the larger
Arcadia Consortium for
! research in software environments.
WWW Requests Currently Supported
--- 39,52 ----
libwww-perl-request@ics.uci.edu
+ A Hypermail
+ Archive of the mailing list is also available.
Support for the initial development and distribution of libwww-perl
has been provided by the
Arcadia Project at UCI, part of the larger
Arcadia Consortium for
! research in software engineering environments.
WWW Requests Currently Supported
***************
*** 69,74 ****
--- 71,81 ----
- MOMspider
- The Multi-Owner Maintenance spider created by Roy Fielding.
+
-
+ w3new
+
- Extract a list of URL's from either your mosaic hotlist or from a
+ HTML document, retrieve their Last-modified dates, and output a HTML
+ file with the URL's sorted by their last modification time.
***************
*** 88,94 ****
Brooks
Cutter, STUFF.com, USA
Contributed wwwbot.pl, testbot, code for escaping and unescaping URLs,
! www'stat(), and several bug fixes.
Roy
Fielding, University of California, Irvine, USA
Architect and primary developer of the library.
--- 95,101 ----
Brooks
Cutter, STUFF.com, USA
Contributed wwwbot.pl, testbot, code for escaping and unescaping URLs,
! www'stat(), wwwmailcap.pl, and many suggestions and bug fixes.
Roy
Fielding, University of California, Irvine, USA
Architect and primary developer of the library.
***************
*** 114,122 ****
The Distribution
For easy distribution, libwww-perl is available as a
!
gzip'd tar file or as a
!
compress'd tar file. It is also available via
anonymous
ftp from liege.ics.uci.edu in the directory
--- 121,129 ----
The Distribution
For easy distribution, libwww-perl is available as a
!
gzip'd tar file or as a
!
compress'd tar file. It is also available via
anonymous
ftp from liege.ics.uci.edu in the directory
***************
*** 128,135 ****
the Artistic License.
Changes.txt
! the complete list of changes
! and version information.
INSTALL.txt
Installation instructions and usage information.
--- 135,141 ----
the Artistic License.
Changes.txt
! the complete list of changes and version information.
INSTALL.txt
Installation instructions and usage information.
***************
*** 137,164 ****
README.html
this document.
! get
a simple program for performing WWW
GET requests from the command-line. The name of the program determines what
! request method to be used (i.e. create a link to it called "head" and
you have a program that does HEAD requests). This program demonstrates the
power and simplicity of the libwww-perl interface.
! mime.types
the standard MIME content-types
and default filename extensions in the same format as that used by
NCSA httpd_1.3 and many WWW clients.
! testbot
a simple program for testing the wwwbot.pl package.
! testdates
a simple program for testing the wwwdates.pl package.
! testescapes
a simple program for testing the wwwurl'escape and unescape routines.
! testlinks
a simple program for testing HTML link extraction and
combinations of GET and HEAD requests.
--- 143,170 ----
README.html
this document.
! get
a simple program for performing WWW
GET requests from the command-line. The name of the program determines what
! request method to be used (i.e. create a link to it called "HEAD" and
you have a program that does HEAD requests). This program demonstrates the
power and simplicity of the libwww-perl interface.
! mime.types
the standard MIME content-types
and default filename extensions in the same format as that used by
NCSA httpd_1.3 and many WWW clients.
! testbot
a simple program for testing the wwwbot.pl package.
! testdates
a simple program for testing the wwwdates.pl package.
! testescapes
a simple program for testing the wwwurl'escape and unescape routines.
! testlinks
a simple program for testing HTML link extraction and
combinations of GET and HEAD requests.
***************
*** 194,199 ****
--- 200,209 ----
a package for performing HTTP requests (URLs of the form
http:*).
+ wwwmailcap.pl
+ a package of library utilities for handling MIME mailcap files
+ and executing viewers by content-type.
+
wwwmime.pl
a package of library utilities for handling MIME content-types
and message headers.
***************
*** 201,207 ****
wwwurl.pl
a package of library utilities for parsing, composing,
manipulating, and canonicalizing Uniform Resource Locators (URLs) as
! they used by the World-Wide Web software and protocols.
--- 211,217 ----
wwwurl.pl
a package of library utilities for parsing, composing,
manipulating, and canonicalizing Uniform Resource Locators (URLs) as
! they are used by the World-Wide Web software and protocols.
***************
*** 208,215 ****
! Current version is 0.20.
- Patch from 0.12 to 0.20
- Patch from 0.11 to 0.12
- Patch from 0.10 to 0.11
--- 218,226 ----
! Current version is 0.30.
+ - Patch from 0.20 to 0.30
- Patch from 0.12 to 0.20
- Patch from 0.11 to 0.12
- Patch from 0.10 to 0.11
***************
*** 251,255 ****
Department of Information and Computer Science,
University of California, Irvine, CA 92717-3425
! Last modified: Wed Jul 20 20:51:23 1994
--- 262,266 ----
Department of Information and Computer Science,
University of California, Irvine, CA 92717-3425
! Last modified: Sun Jul 31 02:45:52 1994
*** ../libwww-perl-0.20/testbot Wed Jul 20 19:47:34 1994
--- testbot Mon Aug 1 06:31:41 1994
***************
*** 1,5 ****
#!/usr/public/bin/perl
! # $Id: testbot,v 1.1 1994/07/21 02:46:45 fielding Exp $
#-----------------------------------------------------------------
# This program tests the wwwbot.pl library and is based on the one
# written by Brooks Cutter.
--- 1,5 ----
#!/usr/public/bin/perl
! # $Id: testbot,v 1.2 1994/08/01 13:31:27 fielding Exp $
#-----------------------------------------------------------------
# This program tests the wwwbot.pl library and is based on the one
# written by Brooks Cutter.
***************
*** 15,20 ****
--- 15,21 ----
# Enter a URL (^D to exit): ua=MOMspider/0.20
#
# 20 Jul 1994 (RTF): Initial Version
+ # 31 Jul 1994 (RTF): Changed interface to wwwbot'allowed
#
#-----------------------------------------------------------------
if ($libloc = $ENV{'LIBWWW_PERL'}) { unshift(@INC, $libloc); }
***************
*** 70,76 ****
$delay = &wwwbot'visitable($url);
print "delay of $delay for $url\n";
if ($delay) { sleep($delay); }
! $ok = &wwwbot'allowed($url, $ua);
print "$ok: $url\n";
&wwwbot'visited($url);
--- 71,77 ----
$delay = &wwwbot'visitable($url);
print "delay of $delay for $url\n";
if ($delay) { sleep($delay); }
! $ok = &wwwbot'allowed($url);
print "$ok: $url\n";
&wwwbot'visited($url);
*** ../libwww-perl-0.20/www.pl Thu Jul 21 01:49:47 1994
--- www.pl Mon Aug 1 06:32:52 1994
***************
*** 1,4 ****
! # $Id: www.pl,v 0.14 1994/07/21 08:49:43 fielding Exp $
# ---------------------------------------------------------------------------
# www.pl: A package for handling requests of any World-Wide Web URL,
# including requests that should be redirected to a proxy server.
--- 1,4 ----
! # $Id: www.pl,v 0.15 1994/08/01 13:32:38 fielding Exp $
# ---------------------------------------------------------------------------
# www.pl: A package for handling requests of any World-Wide Web URL,
# including requests that should be redirected to a proxy server.
***************
*** 22,30 ****
# the DefaultHeaders arrays so that defaults can be set
# once by the client and effect all requests.
# Changed the request eval to version suggested by Brooks.
#
# If you have any suggestions, bug reports, fixes, or enhancements,
! # send them to Roy Fielding at .
# ---------------------------------------------------------------------------
require "wwwurl.pl";
require "wwwmime.pl";
--- 22,32 ----
# the DefaultHeaders arrays so that defaults can be set
# once by the client and effect all requests.
# Changed the request eval to version suggested by Brooks.
+ # 31 Jul 1994 (RTF): Added get_def_header() and lrequest() (from Brooks).
+ # Removed default headers from the stat() interface.
#
# If you have any suggestions, bug reports, fixes, or enhancements,
! # send them to the libwww-perl mailing list at .
# ---------------------------------------------------------------------------
require "wwwurl.pl";
require "wwwmime.pl";
***************
*** 36,42 ****
# and a "request" subroutine.
package www;
! $Library = 'libwww-perl/0.20'; # To be appended onto client's User-Agent
# ==========================================================================
# Get the default From address for HTTP requests and add it to defaults.
--- 38,44 ----
# and a "request" subroutine.
package www;
! $Library = 'libwww-perl/0.30'; # To be appended onto client's User-Agent
# ==========================================================================
# Get the default From address for HTTP requests and add it to defaults.
***************
*** 175,180 ****
--- 177,206 ----
# ===========================================================================
+ # get_def_header(): Allow the client to get the current default header for
+ # a particular.
+ #
+ # Examples:
+ #
+ # $address = &get_def_header('http', 'From');
+ # $agent = &get_def_header('http', 'User-Agent');
+ #
+ # Returns undefined if the named neader has no default.
+ #
+ sub get_def_header
+ {
+ local($scheme, $name) = @_;
+
+ for ($[ .. $#DefaultHeaders)
+ {
+ return $DefHeaderValues[$_] if (($name eq $DefaultHeaders[$_]) &&
+ ($scheme eq $DefHeaderSchemes[$_]));
+ }
+ return undef;
+ }
+
+
+ # ===========================================================================
# check_defaults(): Check the header defaults and, if a corresponding value
# was not set in the request, add the default header to the array.
#
***************
*** 239,245 ****
# $content_encoding,
# $content_language,
# $expires,
! # $message_id) = &www'stat($url, $user_agent, $reply_to);
#
# WHERE,
#
--- 265,271 ----
# $content_encoding,
# $content_language,
# $expires,
! # $message_id) = &www'stat($url);
#
# WHERE,
#
***************
*** 262,279 ****
# Values passed to &www'stat():
#
# $url: Fully qualified http: or file: URL
- # $user_agent: String that includes the name of program calling &www'stat()
- # $reply_to: Your fully-qualified internet Email address
#
# ----------------------------------------------------------------------
# Example: retrieve Last modified and size of What's New with NCSA Mosaic
#
# $url = 'http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html';
! # chop($domain = `hostname`);
! # if (index($domain,'.')==-1) { $domain .= '.' . `domainname`; chop($domain); }
! # $ReplyTo = "$ENV{'USER'}@$domain";
! # $UserAgent = "myprog/0.1 $www'Library";
! # ($rcode,$lastmod,$size) = (&www'stat($url,$UserAgent,$ReplyTo))[0,1,2];
#
# For more information on the returned headers, see the Hypertext Transfer
# Protocol specification, section "The Response/Response Headers" at the URL
--- 288,299 ----
# Values passed to &www'stat():
#
# $url: Fully qualified http: or file: URL
#
# ----------------------------------------------------------------------
# Example: retrieve Last modified and size of What's New with NCSA Mosaic
#
# $url = 'http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html';
! # ($rcode,$lastmod,$size) = (&www'stat($url))[0,1,2];
#
# For more information on the returned headers, see the Hypertext Transfer
# Protocol specification, section "The Response/Response Headers" at the URL
***************
*** 281,294 ****
#
sub stat
{
! local($url, $user_agent, $reply_to) = @_;
local(%headers, $content, $response, $last_modified);
- # Define the HTTP request headers
-
- if ($user_agent) { $headers{'User-Agent'} = $user_agent; }
- if ($reply_to) { $headers{'From'} = $reply_to; }
-
$response = &request('HEAD', $url, *headers, *content, 30);
if ($headers{'last-modified'})
--- 301,309 ----
#
sub stat
{
! local($url) = @_;
local(%headers, $content, $response, $last_modified);
$response = &request('HEAD', $url, *headers, *content, 30);
if ($headers{'last-modified'})
***************
*** 309,313 ****
--- 324,394 ----
);
}
+
+ # ===========================================================================
+ # lrequest(): Same as request() above except that if a redirect response is
+ # returned, perform an automatic redirection by requesting the
+ # new URL. To avoid an infinite loop, this routine will only
+ # perform up to 10 redirections on one request.
+ # Originally submitted by Brooks Cutter, with mods by Roy Fielding.
+ #
+ # Use the following format to request a WWW document:
+ #
+ # $respcode = &www'lrequest($method, *url, *headers, *content, $timeout);
+ #
+ # WHERE,
+ #
+ # $respcode: The three digit response code as defined by HTTP.
+ #
+ # $method: The request method (e.g. 'GET','HEAD','POST',...) Case Significant
+ #
+ # $url: A WWW Uniform Resource Locator in absolute form. If the request
+ # is redirected, $url will be changed to reflect the new URL.
+ #
+ # %headers: (Incoming) Request headers for request, e.g.
+ # $headers{'User-Agent'} = 'MOMspider/0.1'." $www'Library";
+ #
+ # (Returned) Response headers from result (in lower-case), e.g.
+ # $headers{'content-type'} = 'text/html';
+ #
+ # $content: (Incoming) Document to send for methods POST, PUT, etc.
+ #
+ # (Returned) Response body from result.
+ #
+ # $timeout: Number of seconds to wait for a server response (usually 30).
+ #
+ #
+ sub lrequest
+ {
+ local($method, *url, *headers, *content, $timeout) = @_;
+ local($hd, $response);
+
+ foreach $idx (1 .. 10)
+ {
+ $response = &www'request($method, $url, *headers, *content, $timeout);
+ last unless ($response =~ /^30[12]$/);
+ last if ($idx == 10);
+
+ if ($url = $headers{'location'})
+ {
+ $url =~ s/, .*//; # Get rid of multiple Location: entries
+ }
+ elsif ($url = $headers{'uri'})
+ {
+ $url =~ s/\s*;\s+.*//;
+ $url =~ s/, .*//; # Get rid of any multiple URI: entries
+ }
+ else { last; }
+
+ foreach $hd (keys(%headers))
+ {
+ next if ($hd =~ m#^[A-Z]#);
+ delete $headers{$hd};
+ }
+ }
+ return($response);
+ }
+
+ # ===========================================================================
1;
*** ../libwww-perl-0.20/wwwbot.pl Wed Jul 20 19:47:13 1994
--- wwwbot.pl Mon Aug 1 06:32:09 1994
***************
*** 1,4 ****
! # $Id: wwwbot.pl,v 1.1 1994/07/21 02:46:45 fielding Exp $
# ---------------------------------------------------------------------------
# wwwbot.pl: This library implements the Robot Exclusion protocol
# (draft 6/30/94) as documented on
--- 1,4 ----
! # $Id: wwwbot.pl,v 1.2 1994/08/01 13:32:01 fielding Exp $
# ---------------------------------------------------------------------------
# wwwbot.pl: This library implements the Robot Exclusion protocol
# (draft 6/30/94) as documented on
***************
*** 24,33 ****
# Improved /robots.txt parsing (more flexible)
# Wrote documentation and examples for wwwbot routines
# 20 Jul 1994 (RTF): Reformatted a bit for inclusion in standard libwww-perl.
#
# If you have any suggestions, bug reports, fixes, or enhancements,
! # send them to Roy Fielding at or to the libwww-perl
! # mailing list at .
# ---------------------------------------------------------------------------
#
# The following are the ten commandments for writing a Web robot
--- 24,33 ----
# Improved /robots.txt parsing (more flexible)
# Wrote documentation and examples for wwwbot routines
# 20 Jul 1994 (RTF): Reformatted a bit for inclusion in standard libwww-perl.
+ # 30 Jul 1994 (RTF): Changed interface to make use of default User-Agent.
#
# If you have any suggestions, bug reports, fixes, or enhancements,
! # send them to the libwww-perl mailing list at .
# ---------------------------------------------------------------------------
#
# The following are the ten commandments for writing a Web robot
***************
*** 164,169 ****
--- 164,171 ----
#
# $user_agent: The name of your program and optionally the version number.
# Use the form "program_name/v.er" (like "roundabot/1.0")
+ # If null or undefined, it will use the default User-Agent
+ # header defined in www.pl.
#
# Example:
#
***************
*** 187,192 ****
--- 189,203 ----
local($url, $user_agent) = @_;
local($scheme, $address, $port, $path, $query, $frag);
local($ret, $n, $ua);
+
+ if (!$user_agent)
+ {
+ unless($user_agent = &www'get_def_header('http','User-Agent'))
+ {
+ warn "wwwbot'allowed: requires 2nd argument of User-Agent header";
+ return(0);
+ }
+ }
($scheme,$address,$port,$path,$query,$frag) = &wwwurl'parse($url);
unless($port) { $port = 80; }
*** ../libwww-perl-0.20/wwwhtml.pl Thu Jul 21 01:50:37 1994
--- wwwhtml.pl Fri Jul 22 02:58:20 1994
***************
*** 1,4 ****
! # $Id: wwwhtml.pl,v 0.13 1994/07/21 08:50:11 fielding Exp $
# ---------------------------------------------------------------------------
# wwwhtml: A package for parsing pages of HyperText Markup Language (HTML)
# for a World-Wide Web spider.
--- 1,4 ----
! # $Id: wwwhtml.pl,v 0.14 1994/07/22 09:57:52 fielding Exp $
# ---------------------------------------------------------------------------
# wwwhtml: A package for parsing pages of HyperText Markup Language (HTML)
# for a World-Wide Web spider.
***************
*** 16,21 ****
--- 16,26 ----
# Updated META parsing to reflect HTML 2.0 proposal.
# 20 Jul 1994 (RTF): Fix segmentation fault if we are fooled into trying
# to extract links from a non-html document.
+ # 22 Jul 1994 (RTF): Fixed parsing of href's that had a new-line after
+ # the quote mark, causing an extra space to precede the
+ # extracted URL, which in turn created a black hole.
+ # Also added code to extract and change the base URL
+ # if there exists a element.
#
# If you have any suggestions, bug reports, fixes, or enhancements,
# send them to Roy Fielding at .
***************
*** 62,67 ****
--- 67,75 ----
$content =~ s/\s+/ /g; # Remove all extra whitespace and newlines
+ $content =~ s#]*href\s*=\s*"?\s*([^">\s]+)[^>]*>##i; # Base?
+ if ($1) { $base = $1; }
+
$content =~ s#]*>([^<]+)]*>##i; # Extract the title
if ($1) { $headers{'title'} = $1; }
***************
*** 73,84 ****
return unless ($content); # Return if we removed everything
# Isolate all META elements as text
! $content =~ s/]*http-equiv\s*=\s*"?([^">]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi;
! $content =~ s/]*name\s*=\s*"?([^">]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi;
# Isolate all A element HREFs as text
! $content =~ s/]*href\s*=\s*"?([^">]+)[^>]*>/A $1\n/gi;
# Isolate all IMG element SRCs as text
! $content =~ s/
]*src\s*=\s*"?([^">]+)[^>]*>/I $1\n/gi;
$content =~ s/<[^>]*>//g; # Remove all remaining elements
$content =~ s/\n+/\n/g; # Remove all blank lines
--- 81,92 ----
return unless ($content); # Return if we removed everything
# Isolate all META elements as text
! $content =~ s/]*http-equiv\s*=\s*"?\s*([^">\s]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi;
! $content =~ s/]*name\s*=\s*"?\s*([^">\s]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi;
# Isolate all A element HREFs as text
! $content =~ s/]*href\s*=\s*"?\s*([^">\s]+)[^>]*>/A $1\n/gi;
# Isolate all IMG element SRCs as text
! $content =~ s/
]*src\s*=\s*"?\s*([^">\s]+)[^>]*>/I $1\n/gi;
$content =~ s/<[^>]*>//g; # Remove all remaining elements
$content =~ s/\n+/\n/g; # Remove all blank lines
***************
*** 89,95 ****
foreach $elem (split(/\n/,$content))
{
! if ($elem =~ /^A (.*)$/)
{
$orig = $1;
push(@lorig, $orig);
--- 97,103 ----
foreach $elem (split(/\n/,$content))
{
! if ($elem =~ /^A\s+(\S*)$/)
{
$orig = $1;
push(@lorig, $orig);
***************
*** 106,112 ****
}
push(@links, &wwwurl'compose($scheme,$host,$port,$path,'',''));
}
! elsif ($elem =~ /^I (.*)$/)
{
$orig = $1;
push(@lorig, $orig);
--- 114,120 ----
}
push(@links, &wwwurl'compose($scheme,$host,$port,$path,'',''));
}
! elsif ($elem =~ /^I\s+(\S*)$/)
{
$orig = $1;
push(@lorig, $orig);
*** ../libwww-perl-0.20/wwwmailcap.pl Mon Aug 1 06:45:17 1994
--- wwwmailcap.pl Mon Aug 1 06:30:36 1994
***************
*** 0 ****
--- 1,280 ----
+ # $Id: wwwmailcap.pl,v 1.1 1994/08/01 13:30:11 fielding Exp $
+ # ---------------------------------------------------------------------------
+ # wwwmailcap.pl: This library implements routines for parsing a
+ # MIME mailcap file and executing commands based on
+ # a file's MIME Content-type
+ #
+ # This package has been developed by Brooks Cutter .
+ # It is distributed under the Artistic License (included with your Perl
+ # distribution files and with the standard distribution of this package).
+ #
+ # 26 Jul 1994 (BBC): Initial version
+ # 31 Jul 1994 (RTF): Reformatted a bit (and new name) for inclusion in
+ # standard libwww-perl distribution.
+ #
+ # If you have any suggestions, bug reports, fixes, or enhancements,
+ # send them to the libwww-perl mailing list at .
+ # ---------------------------------------------------------------------------
+ #
+ # The MIME mailcap file is defined in the (Work in progress) mailcap draft:
+ # ftp://venera.isi.edu/internet-drafts/draft-borenstein-mailcap-00.ps
+ # ftp://venera.isi.edu/internet-drafts/draft-borenstein-mailcap-00.txt
+ #
+ # This package currently parses mailcap lines, and uses the test field
+ # to determine if the mailcap line should be used. It will save the
+ # 'view' command associated with each MIME Content-type.
+ #
+ # Public/External routines:
+ #
+ # &exists_handler($mime_type,$cmd) - is a mime type handler defined?
+ # Returns 1 if there exists a definition for $mime_type in mailcap
+ # Returns 0 if there is no handler for the specified $mime_type
+ # $mime_type is of the form "type/subtype" like "text/html"
+ # $cmd is optional, and defaults to 'view' if not specified
+ #
+ # &view($mime_type,$fn) - exec view program for $mime_type
+ # $mime_type is of the form "type/subtype" like "text/html"
+ # $fn is the name of the file to pass to the program
+ #
+ # Private/Internal routines:
+ # - You don't need to call (or know about) these routines.
+ # the first time you call &view or &exists_handler, &init
+ # is called and parses the mailcap files..
+ #
+ # &init - load defaults and search for mailcap files
+ # &load_mailcap_default - load mailcap from internal defaults
+ # &load_mailcap_file - local mailcap from external file
+ # &parse_mailcap(@_) - parse mailcap lines intro internal assoc array
+ #
+ #----------------------------------------------------------------------
+ # Todo:
+ # This package recognizes the 'view' and 'test' fields, however there
+ # are a number of other fields that should be recognized, and the values
+ # (if any) should be saved and accessible through the library API.
+ #
+ # The following key/value fields or flags are not currently recognized:
+ #
+ # key/value fields: compose, composetyped, edit, print, description,
+ # textualnewlines, x11bitmap, nametemplate
+ # flags: needsterminal, copiousoutput, needsx11
+ #
+ # Also needs to be done:
+ # - rewrite parse_mailcap so it recognizes lines that end with \ (backslash)
+ # - recognize attribute/variable quoting and pass other needed MIME headers.
+ #----------------------------------------------------------------------
+
+ package wwwmailcap;
+
+
+ # ===========================================================================
+ # This is Mosaic's default mailcap ...
+
+ $gl_default_mailcap = <<'EOF';
+ audio/*; showaudio %s
+ image/xwd; xwud -in %s
+ image/x-xwd; xwud -in %s
+ image/x-xwindowdump; xwud -in %s
+ image/*; xv %s
+ video/mpeg; mpeg_play %s
+ application/postscript; ghostview %s
+ application/x-dvi; xdvi %d
+ message/rfc822; xterm -e metamail %s
+ EOF
+
+ # ===========================================================================
+ # As defined in draft-bornstein-mailcap-00.txt 5/94
+ # If Environment variable MAILCAPS isn't set, use the default search path
+
+ $gl_mailcap_path = $ENV{'MAILCAPS'} ||
+ "$ENV{'HOME'}/.mailcap:/etc/mailcap:/usr/etc/mailcap:/usr/local/etc/mailcap";
+
+
+ # ===========================================================================
+ # ===========================================================================
+ # exists_handler(): returns 1 if there is a "handler" for $mime_type (1st arg)
+ # otherwise returns 0
+ # $cmd indicates the MIME mailcap command to execute
+ # if $cmd isn't specified, it defaults to 'view'
+ #
+ # $ok = &exists_handler($mime_type,$cmd);
+ #
+ # WHERE,
+ #
+ # $ok: 1 if there is a mailcap entry for $mime_type and command $cmd
+ #
+ # $mime_type: A MIME Content-type, like "text/html"
+ #
+ # $cmd: Either null (defaults to 'view') or a MIME command
+ # Currently the only supported command is 'view'
+ #
+ # Example
+ #
+ # $url = 'http://www.host.dom/dir/file.html';
+ # $response = &www'request('GET',$url,*headers,*content,$timeout);
+ # if ($headers{'content-type'} =~ m!^text/(plain|html)$!i) {
+ # print $content; # display it on the screen
+ # } elsif (&wwwmailcap'exists_handler($headers{'content-type'},'view')) {
+ # local($fn) = "/tmp/file.$$.".time;
+ # open(OUT,">$fn"); print OUT $content; close(OUT);
+ # &wwwmailcap'view($headers{'content-type'},$fn);
+ # } else {
+ # print "Unable to handle MIME type $headers{'content-type'}\n";
+ # }
+ #
+ sub exists_handler
+ {
+ local($mime_type, $cmd) = @_;
+
+ &init unless (%gl_mailcap);
+
+ local($type, $subtype) = split(/\//,$mime_type,2);
+ $cmd = 'view' unless(@_);
+ if (($gl_mailcap{$type,$subtype,$cmd}) ||
+ ($gl_mailcap{$type,'*',$cmd}) ||
+ ($gl_mailcap{'*','*',$cmd}))
+ {
+ return(1);
+ }
+ return(0);
+ }
+
+
+ # ===========================================================================
+ # view(): select the 'view' program for $mime_type and execute with $fn
+ #
+ # &exists_handler($mime_type,$fn)
+ #
+ # WHERE,
+ #
+ # $mime_type: A MIME Content-type, like "text/html"
+ #
+ # $fn: The name of the file to pass to the MIME mailcap 'view' program
+ #
+ # Note: For an example including &view, see &exists_handler
+ #
+ sub view
+ {
+ local($mime_type, $fn) = @_;
+
+ &init unless (%gl_mailcap);
+
+ # %s - replace with name of file (otherwise pass data by stdin)
+ # %t - replace with content-type (type/subtype)
+ local($type,$subtype) = split(/\//,$mime_type,2);
+ local($cmd) = $gl_mailcap{$type,$subtype,'view'} ||
+ $gl_mailcap{$type,'*','view'} ||
+ $gl_mailcap{'*','*','view'};
+
+ return unless($cmd); # exit rather than return since already forked
+ return if (fork);
+
+ if ($cmd =~ /%s/)
+ {
+ $cmd =~ s/%s/$fn/g;
+ if ($cmd =~ /%t/) { $cmd =~ s!%t!$type/$subtype!g; }
+ exec "$cmd ; rm -f $fn"; # calls sh -c ..
+ }
+ else
+ {
+ exec "cat $fn | $cmd ; rm -f $fn"; # calls sh -c ..
+ }
+ exit;
+ }
+
+
+ # ===========================================================================
+ # init(): load mailcap files and default mailcap
+ #
+ # &init (no arguments)
+ #
+ sub init
+ {
+ return if (%gl_mailcap);
+ local($_);
+ for (split(/:/,$gl_mailcap_path))
+ {
+ next unless(-f $_);
+ &load_mailcap_file($_);
+ }
+ &load_mailcap_default;
+ }
+
+
+ # ===========================================================================
+ # load_mailcap_default(): load defaults from global var $gl_default_mailcap
+ #
+ # &load_mailcap_default (no arguments)
+ #
+ sub load_mailcap_default
+ {
+ return(-1) unless($gl_default_mailcap);
+ &parse_mailcap(split(/\n/,$gl_default_mailcap));
+ return(0);
+ }
+
+
+ # ===========================================================================
+ # load_mailcap_file(): load mailcap data from external file $fn
+ #
+ # &load_mailcap_file($fn)
+ #
+ # WHERE,
+ #
+ # $fn: Name of mailcap file to load
+ #
+ sub load_mailcap_file
+ {
+ local($fn) = @_;
+
+ open(IN,$fn) || return(-1);
+ &parse_mailcap();
+ close(IN);
+ return(0);
+ }
+
+ # ===========================================================================
+ # parse_mailcap(): parse mailcap lines and store in var $gl_default_mailcap
+ #
+ # &parse_mailcap(@mailcap_lines)
+ #
+ # WHERE,
+ #
+ # @mailcap_lines: One or more mailcap lines. Each element of @mailcap_lines
+ # is a complete entry and has any newlines and/or backslashes
+ # stripped out.
+ #
+ sub parse_mailcap
+ {
+ local($_,$type,$view,@types,$line,$key,$val,$types,$subtype);
+
+ for (@_)
+ {
+ tr/\x00-\x1f\x7f-\xff//d;
+ @types = split(/\s*;\s*/);
+ $types = shift(@types);
+ ($type, $subtype) = split(/\//,$types,2);
+ $view = shift(@types);
+ for $line (@types)
+ {
+ $line =~ s/^\s+//;
+ $line =~ s/\s+$//;
+ ($key,$val) = split(/\s*=\s*/,$line,2);
+ $key =~ tr/A-Z/a-z/;
+ if (($key eq 'test') && ($val))
+ {
+ local(@ret) = `sh -c '$val'`;
+ local($ret) = ($? >> 8); # value returned by exec'ing $val..
+ # if test field returns a non-zero value, ignore the entry
+ next if ($ret);
+ }
+ }
+ if (($view) && (!$gl_mailcap{$type,$subtype,'view'}))
+ {
+ $gl_mailcap{$type,$subtype,'view'} = $view;
+ }
+ }
+ }
+
+ # ===========================================================================
+
+ 1;
*** ../libwww-perl-0.20/wwwurl.pl Sat Jul 16 04:12:22 1994
--- wwwurl.pl Mon Aug 1 06:31:17 1994
***************
*** 1,4 ****
! # $Id: wwwurl.pl,v 0.13 1994/07/16 11:12:16 fielding Exp $
# ---------------------------------------------------------------------------
# wwwurl: A package for parsing and manipulating World-Wide Web
# Uniform Resource Locators (URL).
--- 1,4 ----
! # $Id: wwwurl.pl,v 0.14 1994/08/01 13:30:59 fielding Exp $
# ---------------------------------------------------------------------------
# wwwurl: A package for parsing and manipulating World-Wide Web
# Uniform Resource Locators (URL).
***************
*** 16,21 ****
--- 16,24 ----
# substitute (from Steven E. Brenner via Brooks Cutter).
# Added escape() routine (w/mods) from Brooks Cutter.
# 16 Jul 1994 (RTF): Added get_site() routine.
+ # 27 Jul 1994 (RTF): Firmed-up algorithm for parsing relative URLs, fixing
+ # several potential (but unlikely) bugs in the process.
+ # Removed any hint of "URL:" prefix.
#
# If you have any suggestions, bug reports, fixes, or enhancements,
# send them to Roy Fielding at .
***************
*** 24,41 ****
package wwwurl;
%DefPort = ( # Define the default ports for major net services
! 'ftp', 21,
! 'file', 0, # note: non-local file URLs are changed to ftp URLs
! 'telnet', 23,
! 'whois', 43,
! 'gopher', 70,
! 'finger', 79,
! 'http', 80,
! 'nntp', 119,
! 'news', 119,
! 'prospero', 191,
! 'wais', 210,
! 'webster', 765,
);
%CantChange = ( # Define schemes that cannot be altered by absolute
--- 27,44 ----
package wwwurl;
%DefPort = ( # Define the default ports for major net services
! 'ftp', 21,
! 'file', 0, # note: non-local file URLs are changed to ftp URLs
! 'telnet', 23,
! 'whois', 43,
! 'gopher', 70,
! 'finger', 79,
! 'http', 80,
! 'nntp', 119,
! 'news', 119,
! 'wais', 210,
! 'webster', 765,
! 'prospero', 1525, # I thought it was 191, but IETF differs
);
%CantChange = ( # Define schemes that cannot be altered by absolute
***************
*** 46,52 ****
);
%NonHierarchical = ( # Define remaining schemes that can be changed
! 'telnet', 1, # but which cannot use relative URLs
'rlogin', 1,
'tn3270', 1,
'whois', 1,
--- 49,55 ----
);
%NonHierarchical = ( # Define remaining schemes that can be changed
! 'telnet', 1, # but which cannot use relative URL paths
'rlogin', 1,
'tn3270', 1,
'whois', 1,
***************
*** 54,65 ****
'finger', 1,
);
# ===========================================================================
# parse(): Parse the given URL into its component parts according to
# WWW URI rules, returning '' for those that are not present.
! # If the scheme is not present, the URL will be parsed according
! # to HTTP rules.
#
# Returns the folowing in order:
#
--- 57,78 ----
'finger', 1,
);
+ %UsesQuery = ( # Define schemes that use '?' to denote a query
+ 'http', 1,
+ 'wais', 1,
+ );
+ %UsesParams = ( # Define schemes that use ';' to denote parameters
+ 'ftp', 1,
+ 'prospero', 1,
+ );
+
+
# ===========================================================================
# parse(): Parse the given URL into its component parts according to
# WWW URI rules, returning '' for those that are not present.
! # If no scheme is given, the URL is parsed according to HTTP rules,
! # so schemes which use different rules may have to recombine parts.
#
# Returns the folowing in order:
#
***************
*** 67,74 ****
# $address: The login or hostname/IP address (if appropriate);
# $port : The TCP port (if appropriate);
# $path : The object path;
! # $query : The post-? search info (http and wais only)
! # $frag : The post-'#' fragment identifier
#
sub parse
{
--- 80,87 ----
# $address: The login or hostname/IP address (if appropriate);
# $port : The TCP port (if appropriate);
# $path : The object path;
! # $query : The post-'?' search info (only if scheme uses queries);
! # $frag : The post-'#' fragment identifier.
#
sub parse
{
***************
*** 80,86 ****
local($query) = '';
local($frag) = '';
! if ($url =~ s#^([+\-\w]+):##)
{
$scheme = $1;
$scheme =~ tr/A-Z/a-z/;
--- 93,99 ----
local($query) = '';
local($frag) = '';
! if ($url =~ s#^([.+\-\w]+):##)
{
$scheme = $1;
$scheme =~ tr/A-Z/a-z/;
***************
*** 88,98 ****
if ($url =~ s/#([^#]*)$//) { $frag = $1; }
- if (!$scheme || ($scheme eq 'http') || ($scheme eq 'wais'))
- {
- if ($url =~ s/\?([^?]*)$//) { $query = $1; }
- }
-
if ($url =~ m#^//#o)
{
$url =~ s#^//([^/]*)##;
--- 101,106 ----
***************
*** 102,107 ****
--- 110,121 ----
$port = $1;
}
}
+
+ if (!$scheme || $UsesQuery{$scheme})
+ {
+ if ($url =~ s/\?([^?]*)$//) { $query = $1; }
+ }
+
$path = $url;
return ($scheme, $address, $port, $path, $query, $frag);
***************
*** 133,141 ****
if ($port) { $url .= ":$port"; }
}
! if ($path) { $url .= $path; }
! if ($query) { $url .= "?$query"; }
! if ($frag) { $url .= "#$frag"; }
return $url;
}
--- 147,159 ----
if ($port) { $url .= ":$port"; }
}
! if ($path) { $url .= $path; }
! if ($query)
! {
! if (!$path) { $url .= '/'; } # Avoid mistaking query as being
! $url .= "?$query"; # part of the address
! }
! if ($frag) { $url .= "#$frag"; }
return $url;
}
***************
*** 149,155 ****
{
local($url) = @_;
! $url =~ s/%(..)/pack("C",hex($1))/ge;
return $url;
}
--- 167,173 ----
{
local($url) = @_;
! $url =~ s/%([\dA-F][\dA-F])/pack("C",hex($1))/ge;
return $url;
}
***************
*** 179,185 ****
{
local($parent, $url) = @_;
! $url =~ s/^url://io; # Just in case someone follows dumb IETF format
local($scheme, $addr, $port, $path, $query, $frag) = &parse($url);
--- 197,204 ----
{
local($parent, $url) = @_;
! $url =~ s/^\s+//; # Remove any preceding whitespace
! $url =~ s/\s.*//; # Remove anything after first word
local($scheme, $addr, $port, $path, $query, $frag) = &parse($url);
***************
*** 195,229 ****
local($psch,$paddr,$pport,$ppath,$pquery,$pfrag) = &parse($parent);
! if (!$scheme) { $scheme = $psch; }
! else { last RELATED if ($scheme ne $psch); }
!
! if (!$pport) { $pport = ($DefPort{$psch} || ''); }
!
! if ($addr || $port)
{
! if (!$port) { $port = ($DefPort{$scheme} || ''); }
! last RELATED if ($addr ne $paddr);
! last RELATED if ($port ne $pport);
}
! else
! {
! $addr = $paddr;
! $port = $pport;
! }
! last RELATED if ($NonHierarchical{$scheme});
if (!$path)
{
$path = $ppath;
! last RELATED if ($query);
! $query = $pquery;
}
! elsif ($path !~ m|^/|o) # The order in which we resolve the
! { # relative components is important
! $ppath =~ s#/[^/]*$#/#;
! $path = $ppath . $path;
while ($path =~ s#/\./#/#) {;}
$path =~ s#/\.$#/#;
while ($path =~ s#/[^/]+/\.\./#/#) {;}
--- 214,259 ----
local($psch,$paddr,$pport,$ppath,$pquery,$pfrag) = &parse($parent);
! if (!$scheme)
{
! $scheme = $psch;
! if ($query && !$UsesQuery{$scheme}) # Restore mistaken queries
! {
! $path .= '?'. $query;
! $query = '';
! }
}
! else { last RELATED if ($scheme ne $psch); }
! last RELATED if ($addr || $port); # Child must have used '//'
+ $addr = $paddr;
+ $port = $pport;
+
if (!$path)
{
$path = $ppath;
! if (!$query) { $query = $pquery; }
}
! elsif ($NonHierarchical{$scheme})
! {;} # Do nothing
! elsif ($path !~ m|^/|o) # If the child URL does not begin with '/'
! {
! if ($ppath)
! {
! if ($UsesParams{$scheme})
! {
! $ppath =~ s#;.*##; # Trim off any parent parameters
! }
! $ppath =~ s#/[^/]*$#/#; # Trim off any parent filename
! }
! else { $ppath = '/'; }
!
! $path = $ppath . $path;
! #
! # The order in which we remove the relative "/." and "xxx/.."
! # path segment components is extremely important.
! #
while ($path =~ s#/\./#/#) {;}
$path =~ s#/\.$#/#;
while ($path =~ s#/[^/]+/\.\./#/#) {;}
***************
*** 235,240 ****
--- 265,274 ----
{
if (!$addr) { $addr = 'localhost'; }
elsif ($addr ne 'localhost') { $scheme = 'ftp'; }
+ #
+ # The above line will have to be deleted once people stop using
+ # file: as an alias for ftp: (i.e. when the IETF standard is done).
+ #
}
elsif ($scheme eq 'http') { $path =~ s#^/\%7E#/~#; }