*** ../libwww-perl-0.20/Artistic.txt Fri Jul 8 01:15:59 1994 --- Artistic.txt Mon Aug 1 06:23:08 1994 *************** *** 56,62 **** a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive ! site such as uunet.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package. b) use the modified Package only within your corporation or organization. --- 56,62 ---- a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive ! site such as ftp.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package. b) use the modified Package only within your corporation or organization. *************** *** 100,111 **** them, and may be sold commercially, and may be aggregated with this Package. ! 7. C subroutines supplied by you and linked into this Package in order ! to emulate subroutines and variables of the language defined by this ! Package shall not be considered part of this Package, but are the ! equivalent of input as in Paragraph 6, provided these subroutines do ! not change the language in any way that would cause it to fail the ! regression tests for the language. 8. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission. --- 100,107 ---- them, and may be sold commercially, and may be aggregated with this Package. ! 7. C or perl subroutines supplied by you and linked into this Package ! shall not be considered part of this Package. 8. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission. *** ../libwww-perl-0.20/Changes.txt Thu Jul 21 01:59:06 1994 --- Changes.txt Mon Aug 1 06:43:20 1994 *************** *** 1,6 **** Changes to libwww-perl ====================== ! # $Id: Changes.txt,v 0.15 1994/07/21 08:59:01 fielding Exp $ See the files README.html and Artistic.txt for licensing and distribution info. See the file INSTALL.txt for installation information. --- 1,6 ---- Changes to libwww-perl ====================== ! # $Id: Changes.txt,v 0.16 1994/08/01 13:43:05 fielding Exp $ See the files README.html and Artistic.txt for licensing and distribution info. See the file INSTALL.txt for installation information. *************** *** 27,32 **** --- 27,51 ---- method for making requests) which is still compatible with the old. The last number reflects minor bug fixes and documentation updates. + + Version 0.30 August 1, 1994 + Added the wwwmailcap.pl library for handling MIME mailcap files, + www'get_def_header() for reading the default headers, and www'lrequest() + for doing autoredirected requests (all submitted by Brooks Cutter). + Removed the default headers from the www'stat() interface. + Changed the testbot and wwwbot'allowed interface to make use of the + default User-Agent header. + Firmed-up the URL parsing algorithm in wwwurl.pl (particularly relating + to the parsing of relative URLs) to coincide with the IETF standards + discussion. This fixed several potential (but unlikely) bugs and also + got rid of any "URL:" prefix parsing [finally!]. + Fixed parsing in wwwhtml.pl of href's that had a new-line after the + quote mark, causing an extra space to precede the extracted URL, + which in turn created a black hole. Also added code to extract and change + the base URL if there exists a element. + Updated the wording in Artistic.txt to represent a Perl API rather than + a compiler written in C (as is the Perl distribution). + Version 0.20 July 20, 1994 Added the wwwbot.pl library and testbot program (by Brooks Cutter) *** ../libwww-perl-0.20/INSTALL.txt Wed Jul 20 20:55:41 1994 --- INSTALL.txt Mon Aug 1 06:40:22 1994 *************** *** 1,6 **** libwww-perl Installation Information ==================================== ! # $Id: INSTALL.txt,v 0.13 1994/07/21 03:55:33 fielding Exp $ See the files README.html and Artistic.txt for licensing and distribution info. See the file Changes.txt for a complete list of changes and version history. --- 1,6 ---- libwww-perl Installation Information ==================================== ! # $Id: INSTALL.txt,v 0.14 1994/08/01 13:40:13 fielding Exp $ See the files README.html and Artistic.txt for licensing and distribution info. See the file Changes.txt for a complete list of changes and version history. *************** *** 21,38 **** it will be in the form of a compressed unix tar file. If it has not already been decompressed by your WWW client, then do one of: ! % uncompress libwww-perl-0.20.tar.Z ! % gunzip libwww-perl-0.20.tar.gz depending on which compressed version you downloaded. ! 2. Move the resulting libwww-perl-0.20.tar file to the directory above where you want to install libwww-perl, cd to that directory, and do ! % tar xvf libwww-perl-0.20.tar ! to create the directory ./libwww-perl-0.20 containing the following: Artistic.txt -- the Artistic License governing redistribution of the libwww-perl package. --- 21,38 ---- it will be in the form of a compressed unix tar file. If it has not already been decompressed by your WWW client, then do one of: ! % uncompress libwww-perl-0.30.tar.Z ! % gunzip libwww-perl-0.30.tar.gz depending on which compressed version you downloaded. ! 2. Move the resulting libwww-perl-0.30.tar file to the directory above where you want to install libwww-perl, cd to that directory, and do ! % tar xvf libwww-perl-0.30.tar ! to create the directory ./libwww-perl-0.30 containing the following: Artistic.txt -- the Artistic License governing redistribution of the libwww-perl package. *************** *** 72,82 **** manipulating HTML documents. wwwhttp.pl -- a package for performing HTTP requests (URLs of the form http:*). wwwmime.pl -- a package of library utilities for handling MIME content-types and message headers. wwwurl.pl -- a package of library utilities for parsing, composing, manipulating, and canonicalizing Uniform Resource ! Locators (URLs) as they used by the World-Wide Web software and protocols. --- 72,84 ---- manipulating HTML documents. wwwhttp.pl -- a package for performing HTTP requests (URLs of the form http:*). + wwwmailcap.pl -- a package of library utilities for handling MIME + mailcap files and executing viewers by content-type. wwwmime.pl -- a package of library utilities for handling MIME content-types and message headers. wwwurl.pl -- a package of library utilities for parsing, composing, manipulating, and canonicalizing Uniform Resource ! Locators (URLs) as they are used by the World-Wide Web software and protocols. *************** *** 91,97 **** 4. The LIBWWW_PERL environment variable must be set to point to the libwww-perl directory, e.g. ! % setenv LIBWWW_PERL /usr/local/lib/libwww-perl-0.20 This allows clients like "get" to place the libwww-perl on their @INC path and also allows wwwmime.pl to find the standard mime.types file. --- 93,99 ---- 4. The LIBWWW_PERL environment variable must be set to point to the libwww-perl directory, e.g. ! % setenv LIBWWW_PERL /usr/local/lib/libwww-perl-0.30 This allows clients like "get" to place the libwww-perl on their @INC path and also allows wwwmime.pl to find the standard mime.types file. *************** *** 137,143 **** First, you should join the libwww-perl mailing list by sending a subscribe request, including your name and preferred e-mail address, to . You will be sent a welcome ! message when you are placed on the list. After that, send a mail message describing your changes or suggestions to and we can all talk about them. --- 139,148 ---- First, you should join the libwww-perl mailing list by sending a subscribe request, including your name and preferred e-mail address, to . You will be sent a welcome ! message when you are placed on the list. To see what the list looks like, ! see the Hypermail Archive of it at: ! ! After that, send a mail message describing your changes or suggestions to and we can all talk about them. *** ../libwww-perl-0.20/README.html Wed Jul 20 20:55:17 1994 --- README.html Mon Aug 1 06:40:51 1994 *************** *** 1,5 **** ! libwww-perl: Distribution Information --- 1,5 ---- ! libwww-perl: Distribution Information *************** *** 39,50 ****
       libwww-perl-request@ics.uci.edu
  
Support for the initial development and distribution of libwww-perl has been provided by the Arcadia Project at UCI, part of the larger Arcadia Consortium for ! research in software environments.

WWW Requests Currently Supported

--- 39,52 ----
       libwww-perl-request@ics.uci.edu
  
+ A Hypermail + Archive of the mailing list is also available.

Support for the initial development and distribution of libwww-perl has been provided by the Arcadia Project at UCI, part of the larger Arcadia Consortium for ! research in software engineering environments.

WWW Requests Currently Supported

*************** *** 69,74 **** --- 71,81 ----
MOMspider
The Multi-Owner Maintenance spider created by Roy Fielding. +
+ w3new +
Extract a list of URL's from either your mosaic hotlist or from a + HTML document, retrieve their Last-modified dates, and output a HTML + file with the URL's sorted by their last modification time.
*************** *** 88,94 ****
Brooks Cutter, STUFF.com, USA
Contributed wwwbot.pl, testbot, code for escaping and unescaping URLs, ! www'stat(), and several bug fixes.
Roy Fielding, University of California, Irvine, USA
Architect and primary developer of the library. --- 95,101 ----
Brooks Cutter, STUFF.com, USA
Contributed wwwbot.pl, testbot, code for escaping and unescaping URLs, ! www'stat(), wwwmailcap.pl, and many suggestions and bug fixes.
Roy Fielding, University of California, Irvine, USA
Architect and primary developer of the library. *************** *** 114,122 ****

The Distribution

For easy distribution, libwww-perl is available as a ! gzip'd tar file or as a ! compress'd tar file. It is also available via anonymous ftp from liege.ics.uci.edu in the directory --- 121,129 ----

The Distribution

For easy distribution, libwww-perl is available as a ! gzip'd tar file or as a ! compress'd tar file. It is also available via anonymous ftp from liege.ics.uci.edu in the directory *************** *** 128,135 ****
the Artistic License.
Changes.txt !
the complete list of changes ! and version information.
INSTALL.txt
Installation instructions and usage information. --- 135,141 ----
the Artistic License.
Changes.txt !
the complete list of changes and version information.
INSTALL.txt
Installation instructions and usage information. *************** *** 137,164 ****
README.html
this document. !
get
a simple program for performing WWW GET requests from the command-line. The name of the program determines what ! request method to be used (i.e. create a link to it called "head" and you have a program that does HEAD requests). This program demonstrates the power and simplicity of the libwww-perl interface. !
mime.types
the standard MIME content-types and default filename extensions in the same format as that used by NCSA httpd_1.3 and many WWW clients. !
testbot
a simple program for testing the wwwbot.pl package. !
testdates
a simple program for testing the wwwdates.pl package. !
testescapes
a simple program for testing the wwwurl'escape and unescape routines. !
testlinks
a simple program for testing HTML link extraction and combinations of GET and HEAD requests. --- 143,170 ----
README.html
this document. !
get
a simple program for performing WWW GET requests from the command-line. The name of the program determines what ! request method to be used (i.e. create a link to it called "HEAD" and you have a program that does HEAD requests). This program demonstrates the power and simplicity of the libwww-perl interface. !
mime.types
the standard MIME content-types and default filename extensions in the same format as that used by NCSA httpd_1.3 and many WWW clients. !
testbot
a simple program for testing the wwwbot.pl package. !
testdates
a simple program for testing the wwwdates.pl package. !
testescapes
a simple program for testing the wwwurl'escape and unescape routines. !
testlinks
a simple program for testing HTML link extraction and combinations of GET and HEAD requests. *************** *** 194,199 **** --- 200,209 ----
a package for performing HTTP requests (URLs of the form http:*). +
wwwmailcap.pl +
a package of library utilities for handling MIME mailcap files + and executing viewers by content-type. +
wwwmime.pl
a package of library utilities for handling MIME content-types and message headers. *************** *** 201,207 ****
wwwurl.pl
a package of library utilities for parsing, composing, manipulating, and canonicalizing Uniform Resource Locators (URLs) as ! they used by the World-Wide Web software and protocols. --- 211,217 ----
wwwurl.pl
a package of library utilities for parsing, composing, manipulating, and canonicalizing Uniform Resource Locators (URLs) as ! they are used by the World-Wide Web software and protocols. *************** *** 208,215 ****

Version History

! Current version is 0.20.
  • Patch from 0.12 to 0.20
  • Patch from 0.11 to 0.12
  • Patch from 0.10 to 0.11 --- 218,226 ----

    Version History

    ! Current version is 0.30.
      +
    • Patch from 0.20 to 0.30
    • Patch from 0.12 to 0.20
    • Patch from 0.11 to 0.12
    • Patch from 0.10 to 0.11 *************** *** 251,255 **** Department of Information and Computer Science,
      University of California, Irvine, CA 92717-3425 ! Last modified: Wed Jul 20 20:51:23 1994 --- 262,266 ---- Department of Information and Computer Science,
      University of California, Irvine, CA 92717-3425 ! Last modified: Sun Jul 31 02:45:52 1994 *** ../libwww-perl-0.20/testbot Wed Jul 20 19:47:34 1994 --- testbot Mon Aug 1 06:31:41 1994 *************** *** 1,5 **** #!/usr/public/bin/perl ! # $Id: testbot,v 1.1 1994/07/21 02:46:45 fielding Exp $ #----------------------------------------------------------------- # This program tests the wwwbot.pl library and is based on the one # written by Brooks Cutter. --- 1,5 ---- #!/usr/public/bin/perl ! # $Id: testbot,v 1.2 1994/08/01 13:31:27 fielding Exp $ #----------------------------------------------------------------- # This program tests the wwwbot.pl library and is based on the one # written by Brooks Cutter. *************** *** 15,20 **** --- 15,21 ---- # Enter a URL (^D to exit): ua=MOMspider/0.20 # # 20 Jul 1994 (RTF): Initial Version + # 31 Jul 1994 (RTF): Changed interface to wwwbot'allowed # #----------------------------------------------------------------- if ($libloc = $ENV{'LIBWWW_PERL'}) { unshift(@INC, $libloc); } *************** *** 70,76 **** $delay = &wwwbot'visitable($url); print "delay of $delay for $url\n"; if ($delay) { sleep($delay); } ! $ok = &wwwbot'allowed($url, $ua); print "$ok: $url\n"; &wwwbot'visited($url); --- 71,77 ---- $delay = &wwwbot'visitable($url); print "delay of $delay for $url\n"; if ($delay) { sleep($delay); } ! $ok = &wwwbot'allowed($url); print "$ok: $url\n"; &wwwbot'visited($url); *** ../libwww-perl-0.20/www.pl Thu Jul 21 01:49:47 1994 --- www.pl Mon Aug 1 06:32:52 1994 *************** *** 1,4 **** ! # $Id: www.pl,v 0.14 1994/07/21 08:49:43 fielding Exp $ # --------------------------------------------------------------------------- # www.pl: A package for handling requests of any World-Wide Web URL, # including requests that should be redirected to a proxy server. --- 1,4 ---- ! # $Id: www.pl,v 0.15 1994/08/01 13:32:38 fielding Exp $ # --------------------------------------------------------------------------- # www.pl: A package for handling requests of any World-Wide Web URL, # including requests that should be redirected to a proxy server. *************** *** 22,30 **** # the DefaultHeaders arrays so that defaults can be set # once by the client and effect all requests. # Changed the request eval to version suggested by Brooks. # # If you have any suggestions, bug reports, fixes, or enhancements, ! # send them to Roy Fielding at . # --------------------------------------------------------------------------- require "wwwurl.pl"; require "wwwmime.pl"; --- 22,32 ---- # the DefaultHeaders arrays so that defaults can be set # once by the client and effect all requests. # Changed the request eval to version suggested by Brooks. + # 31 Jul 1994 (RTF): Added get_def_header() and lrequest() (from Brooks). + # Removed default headers from the stat() interface. # # If you have any suggestions, bug reports, fixes, or enhancements, ! # send them to the libwww-perl mailing list at . # --------------------------------------------------------------------------- require "wwwurl.pl"; require "wwwmime.pl"; *************** *** 36,42 **** # and a "request" subroutine. package www; ! $Library = 'libwww-perl/0.20'; # To be appended onto client's User-Agent # ========================================================================== # Get the default From address for HTTP requests and add it to defaults. --- 38,44 ---- # and a "request" subroutine. package www; ! $Library = 'libwww-perl/0.30'; # To be appended onto client's User-Agent # ========================================================================== # Get the default From address for HTTP requests and add it to defaults. *************** *** 175,180 **** --- 177,206 ---- # =========================================================================== + # get_def_header(): Allow the client to get the current default header for + # a particular. + # + # Examples: + # + # $address = &get_def_header('http', 'From'); + # $agent = &get_def_header('http', 'User-Agent'); + # + # Returns undefined if the named neader has no default. + # + sub get_def_header + { + local($scheme, $name) = @_; + + for ($[ .. $#DefaultHeaders) + { + return $DefHeaderValues[$_] if (($name eq $DefaultHeaders[$_]) && + ($scheme eq $DefHeaderSchemes[$_])); + } + return undef; + } + + + # =========================================================================== # check_defaults(): Check the header defaults and, if a corresponding value # was not set in the request, add the default header to the array. # *************** *** 239,245 **** # $content_encoding, # $content_language, # $expires, ! # $message_id) = &www'stat($url, $user_agent, $reply_to); # # WHERE, # --- 265,271 ---- # $content_encoding, # $content_language, # $expires, ! # $message_id) = &www'stat($url); # # WHERE, # *************** *** 262,279 **** # Values passed to &www'stat(): # # $url: Fully qualified http: or file: URL - # $user_agent: String that includes the name of program calling &www'stat() - # $reply_to: Your fully-qualified internet Email address # # ---------------------------------------------------------------------- # Example: retrieve Last modified and size of What's New with NCSA Mosaic # # $url = 'http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html'; ! # chop($domain = `hostname`); ! # if (index($domain,'.')==-1) { $domain .= '.' . `domainname`; chop($domain); } ! # $ReplyTo = "$ENV{'USER'}@$domain"; ! # $UserAgent = "myprog/0.1 $www'Library"; ! # ($rcode,$lastmod,$size) = (&www'stat($url,$UserAgent,$ReplyTo))[0,1,2]; # # For more information on the returned headers, see the Hypertext Transfer # Protocol specification, section "The Response/Response Headers" at the URL --- 288,299 ---- # Values passed to &www'stat(): # # $url: Fully qualified http: or file: URL # # ---------------------------------------------------------------------- # Example: retrieve Last modified and size of What's New with NCSA Mosaic # # $url = 'http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html'; ! # ($rcode,$lastmod,$size) = (&www'stat($url))[0,1,2]; # # For more information on the returned headers, see the Hypertext Transfer # Protocol specification, section "The Response/Response Headers" at the URL *************** *** 281,294 **** # sub stat { ! local($url, $user_agent, $reply_to) = @_; local(%headers, $content, $response, $last_modified); - # Define the HTTP request headers - - if ($user_agent) { $headers{'User-Agent'} = $user_agent; } - if ($reply_to) { $headers{'From'} = $reply_to; } - $response = &request('HEAD', $url, *headers, *content, 30); if ($headers{'last-modified'}) --- 301,309 ---- # sub stat { ! local($url) = @_; local(%headers, $content, $response, $last_modified); $response = &request('HEAD', $url, *headers, *content, 30); if ($headers{'last-modified'}) *************** *** 309,313 **** --- 324,394 ---- ); } + + # =========================================================================== + # lrequest(): Same as request() above except that if a redirect response is + # returned, perform an automatic redirection by requesting the + # new URL. To avoid an infinite loop, this routine will only + # perform up to 10 redirections on one request. + # Originally submitted by Brooks Cutter, with mods by Roy Fielding. + # + # Use the following format to request a WWW document: + # + # $respcode = &www'lrequest($method, *url, *headers, *content, $timeout); + # + # WHERE, + # + # $respcode: The three digit response code as defined by HTTP. + # + # $method: The request method (e.g. 'GET','HEAD','POST',...) Case Significant + # + # $url: A WWW Uniform Resource Locator in absolute form. If the request + # is redirected, $url will be changed to reflect the new URL. + # + # %headers: (Incoming) Request headers for request, e.g. + # $headers{'User-Agent'} = 'MOMspider/0.1'." $www'Library"; + # + # (Returned) Response headers from result (in lower-case), e.g. + # $headers{'content-type'} = 'text/html'; + # + # $content: (Incoming) Document to send for methods POST, PUT, etc. + # + # (Returned) Response body from result. + # + # $timeout: Number of seconds to wait for a server response (usually 30). + # + # + sub lrequest + { + local($method, *url, *headers, *content, $timeout) = @_; + local($hd, $response); + + foreach $idx (1 .. 10) + { + $response = &www'request($method, $url, *headers, *content, $timeout); + last unless ($response =~ /^30[12]$/); + last if ($idx == 10); + + if ($url = $headers{'location'}) + { + $url =~ s/, .*//; # Get rid of multiple Location: entries + } + elsif ($url = $headers{'uri'}) + { + $url =~ s/\s*;\s+.*//; + $url =~ s/, .*//; # Get rid of any multiple URI: entries + } + else { last; } + + foreach $hd (keys(%headers)) + { + next if ($hd =~ m#^[A-Z]#); + delete $headers{$hd}; + } + } + return($response); + } + + # =========================================================================== 1; *** ../libwww-perl-0.20/wwwbot.pl Wed Jul 20 19:47:13 1994 --- wwwbot.pl Mon Aug 1 06:32:09 1994 *************** *** 1,4 **** ! # $Id: wwwbot.pl,v 1.1 1994/07/21 02:46:45 fielding Exp $ # --------------------------------------------------------------------------- # wwwbot.pl: This library implements the Robot Exclusion protocol # (draft 6/30/94) as documented on --- 1,4 ---- ! # $Id: wwwbot.pl,v 1.2 1994/08/01 13:32:01 fielding Exp $ # --------------------------------------------------------------------------- # wwwbot.pl: This library implements the Robot Exclusion protocol # (draft 6/30/94) as documented on *************** *** 24,33 **** # Improved /robots.txt parsing (more flexible) # Wrote documentation and examples for wwwbot routines # 20 Jul 1994 (RTF): Reformatted a bit for inclusion in standard libwww-perl. # # If you have any suggestions, bug reports, fixes, or enhancements, ! # send them to Roy Fielding at or to the libwww-perl ! # mailing list at . # --------------------------------------------------------------------------- # # The following are the ten commandments for writing a Web robot --- 24,33 ---- # Improved /robots.txt parsing (more flexible) # Wrote documentation and examples for wwwbot routines # 20 Jul 1994 (RTF): Reformatted a bit for inclusion in standard libwww-perl. + # 30 Jul 1994 (RTF): Changed interface to make use of default User-Agent. # # If you have any suggestions, bug reports, fixes, or enhancements, ! # send them to the libwww-perl mailing list at . # --------------------------------------------------------------------------- # # The following are the ten commandments for writing a Web robot *************** *** 164,169 **** --- 164,171 ---- # # $user_agent: The name of your program and optionally the version number. # Use the form "program_name/v.er" (like "roundabot/1.0") + # If null or undefined, it will use the default User-Agent + # header defined in www.pl. # # Example: # *************** *** 187,192 **** --- 189,203 ---- local($url, $user_agent) = @_; local($scheme, $address, $port, $path, $query, $frag); local($ret, $n, $ua); + + if (!$user_agent) + { + unless($user_agent = &www'get_def_header('http','User-Agent')) + { + warn "wwwbot'allowed: requires 2nd argument of User-Agent header"; + return(0); + } + } ($scheme,$address,$port,$path,$query,$frag) = &wwwurl'parse($url); unless($port) { $port = 80; } *** ../libwww-perl-0.20/wwwhtml.pl Thu Jul 21 01:50:37 1994 --- wwwhtml.pl Fri Jul 22 02:58:20 1994 *************** *** 1,4 **** ! # $Id: wwwhtml.pl,v 0.13 1994/07/21 08:50:11 fielding Exp $ # --------------------------------------------------------------------------- # wwwhtml: A package for parsing pages of HyperText Markup Language (HTML) # for a World-Wide Web spider. --- 1,4 ---- ! # $Id: wwwhtml.pl,v 0.14 1994/07/22 09:57:52 fielding Exp $ # --------------------------------------------------------------------------- # wwwhtml: A package for parsing pages of HyperText Markup Language (HTML) # for a World-Wide Web spider. *************** *** 16,21 **** --- 16,26 ---- # Updated META parsing to reflect HTML 2.0 proposal. # 20 Jul 1994 (RTF): Fix segmentation fault if we are fooled into trying # to extract links from a non-html document. + # 22 Jul 1994 (RTF): Fixed parsing of href's that had a new-line after + # the quote mark, causing an extra space to precede the + # extracted URL, which in turn created a black hole. + # Also added code to extract and change the base URL + # if there exists a element. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 62,67 **** --- 67,75 ---- $content =~ s/\s+/ /g; # Remove all extra whitespace and newlines + $content =~ s#]*href\s*=\s*"?\s*([^">\s]+)[^>]*>##i; # Base? + if ($1) { $base = $1; } + $content =~ s#]*>([^<]+)]*>##i; # Extract the title if ($1) { $headers{'title'} = $1; } *************** *** 73,84 **** return unless ($content); # Return if we removed everything # Isolate all META elements as text ! $content =~ s/]*http-equiv\s*=\s*"?([^">]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; ! $content =~ s/]*name\s*=\s*"?([^">]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; # Isolate all A element HREFs as text ! $content =~ s/]*href\s*=\s*"?([^">]+)[^>]*>/A $1\n/gi; # Isolate all IMG element SRCs as text ! $content =~ s/]*src\s*=\s*"?([^">]+)[^>]*>/I $1\n/gi; $content =~ s/<[^>]*>//g; # Remove all remaining elements $content =~ s/\n+/\n/g; # Remove all blank lines --- 81,92 ---- return unless ($content); # Return if we removed everything # Isolate all META elements as text ! $content =~ s/]*http-equiv\s*=\s*"?\s*([^">\s]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; ! $content =~ s/]*name\s*=\s*"?\s*([^">\s]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; # Isolate all A element HREFs as text ! $content =~ s/]*href\s*=\s*"?\s*([^">\s]+)[^>]*>/A $1\n/gi; # Isolate all IMG element SRCs as text ! $content =~ s/]*src\s*=\s*"?\s*([^">\s]+)[^>]*>/I $1\n/gi; $content =~ s/<[^>]*>//g; # Remove all remaining elements $content =~ s/\n+/\n/g; # Remove all blank lines *************** *** 89,95 **** foreach $elem (split(/\n/,$content)) { ! if ($elem =~ /^A (.*)$/) { $orig = $1; push(@lorig, $orig); --- 97,103 ---- foreach $elem (split(/\n/,$content)) { ! if ($elem =~ /^A\s+(\S*)$/) { $orig = $1; push(@lorig, $orig); *************** *** 106,112 **** } push(@links, &wwwurl'compose($scheme,$host,$port,$path,'','')); } ! elsif ($elem =~ /^I (.*)$/) { $orig = $1; push(@lorig, $orig); --- 114,120 ---- } push(@links, &wwwurl'compose($scheme,$host,$port,$path,'','')); } ! elsif ($elem =~ /^I\s+(\S*)$/) { $orig = $1; push(@lorig, $orig); *** ../libwww-perl-0.20/wwwmailcap.pl Mon Aug 1 06:45:17 1994 --- wwwmailcap.pl Mon Aug 1 06:30:36 1994 *************** *** 0 **** --- 1,280 ---- + # $Id: wwwmailcap.pl,v 1.1 1994/08/01 13:30:11 fielding Exp $ + # --------------------------------------------------------------------------- + # wwwmailcap.pl: This library implements routines for parsing a + # MIME mailcap file and executing commands based on + # a file's MIME Content-type + # + # This package has been developed by Brooks Cutter . + # It is distributed under the Artistic License (included with your Perl + # distribution files and with the standard distribution of this package). + # + # 26 Jul 1994 (BBC): Initial version + # 31 Jul 1994 (RTF): Reformatted a bit (and new name) for inclusion in + # standard libwww-perl distribution. + # + # If you have any suggestions, bug reports, fixes, or enhancements, + # send them to the libwww-perl mailing list at . + # --------------------------------------------------------------------------- + # + # The MIME mailcap file is defined in the (Work in progress) mailcap draft: + # ftp://venera.isi.edu/internet-drafts/draft-borenstein-mailcap-00.ps + # ftp://venera.isi.edu/internet-drafts/draft-borenstein-mailcap-00.txt + # + # This package currently parses mailcap lines, and uses the test field + # to determine if the mailcap line should be used. It will save the + # 'view' command associated with each MIME Content-type. + # + # Public/External routines: + # + # &exists_handler($mime_type,$cmd) - is a mime type handler defined? + # Returns 1 if there exists a definition for $mime_type in mailcap + # Returns 0 if there is no handler for the specified $mime_type + # $mime_type is of the form "type/subtype" like "text/html" + # $cmd is optional, and defaults to 'view' if not specified + # + # &view($mime_type,$fn) - exec view program for $mime_type + # $mime_type is of the form "type/subtype" like "text/html" + # $fn is the name of the file to pass to the program + # + # Private/Internal routines: + # - You don't need to call (or know about) these routines. + # the first time you call &view or &exists_handler, &init + # is called and parses the mailcap files.. + # + # &init - load defaults and search for mailcap files + # &load_mailcap_default - load mailcap from internal defaults + # &load_mailcap_file - local mailcap from external file + # &parse_mailcap(@_) - parse mailcap lines intro internal assoc array + # + #---------------------------------------------------------------------- + # Todo: + # This package recognizes the 'view' and 'test' fields, however there + # are a number of other fields that should be recognized, and the values + # (if any) should be saved and accessible through the library API. + # + # The following key/value fields or flags are not currently recognized: + # + # key/value fields: compose, composetyped, edit, print, description, + # textualnewlines, x11bitmap, nametemplate + # flags: needsterminal, copiousoutput, needsx11 + # + # Also needs to be done: + # - rewrite parse_mailcap so it recognizes lines that end with \ (backslash) + # - recognize attribute/variable quoting and pass other needed MIME headers. + #---------------------------------------------------------------------- + + package wwwmailcap; + + + # =========================================================================== + # This is Mosaic's default mailcap ... + + $gl_default_mailcap = <<'EOF'; + audio/*; showaudio %s + image/xwd; xwud -in %s + image/x-xwd; xwud -in %s + image/x-xwindowdump; xwud -in %s + image/*; xv %s + video/mpeg; mpeg_play %s + application/postscript; ghostview %s + application/x-dvi; xdvi %d + message/rfc822; xterm -e metamail %s + EOF + + # =========================================================================== + # As defined in draft-bornstein-mailcap-00.txt 5/94 + # If Environment variable MAILCAPS isn't set, use the default search path + + $gl_mailcap_path = $ENV{'MAILCAPS'} || + "$ENV{'HOME'}/.mailcap:/etc/mailcap:/usr/etc/mailcap:/usr/local/etc/mailcap"; + + + # =========================================================================== + # =========================================================================== + # exists_handler(): returns 1 if there is a "handler" for $mime_type (1st arg) + # otherwise returns 0 + # $cmd indicates the MIME mailcap command to execute + # if $cmd isn't specified, it defaults to 'view' + # + # $ok = &exists_handler($mime_type,$cmd); + # + # WHERE, + # + # $ok: 1 if there is a mailcap entry for $mime_type and command $cmd + # + # $mime_type: A MIME Content-type, like "text/html" + # + # $cmd: Either null (defaults to 'view') or a MIME command + # Currently the only supported command is 'view' + # + # Example + # + # $url = 'http://www.host.dom/dir/file.html'; + # $response = &www'request('GET',$url,*headers,*content,$timeout); + # if ($headers{'content-type'} =~ m!^text/(plain|html)$!i) { + # print $content; # display it on the screen + # } elsif (&wwwmailcap'exists_handler($headers{'content-type'},'view')) { + # local($fn) = "/tmp/file.$$.".time; + # open(OUT,">$fn"); print OUT $content; close(OUT); + # &wwwmailcap'view($headers{'content-type'},$fn); + # } else { + # print "Unable to handle MIME type $headers{'content-type'}\n"; + # } + # + sub exists_handler + { + local($mime_type, $cmd) = @_; + + &init unless (%gl_mailcap); + + local($type, $subtype) = split(/\//,$mime_type,2); + $cmd = 'view' unless(@_); + if (($gl_mailcap{$type,$subtype,$cmd}) || + ($gl_mailcap{$type,'*',$cmd}) || + ($gl_mailcap{'*','*',$cmd})) + { + return(1); + } + return(0); + } + + + # =========================================================================== + # view(): select the 'view' program for $mime_type and execute with $fn + # + # &exists_handler($mime_type,$fn) + # + # WHERE, + # + # $mime_type: A MIME Content-type, like "text/html" + # + # $fn: The name of the file to pass to the MIME mailcap 'view' program + # + # Note: For an example including &view, see &exists_handler + # + sub view + { + local($mime_type, $fn) = @_; + + &init unless (%gl_mailcap); + + # %s - replace with name of file (otherwise pass data by stdin) + # %t - replace with content-type (type/subtype) + local($type,$subtype) = split(/\//,$mime_type,2); + local($cmd) = $gl_mailcap{$type,$subtype,'view'} || + $gl_mailcap{$type,'*','view'} || + $gl_mailcap{'*','*','view'}; + + return unless($cmd); # exit rather than return since already forked + return if (fork); + + if ($cmd =~ /%s/) + { + $cmd =~ s/%s/$fn/g; + if ($cmd =~ /%t/) { $cmd =~ s!%t!$type/$subtype!g; } + exec "$cmd ; rm -f $fn"; # calls sh -c .. + } + else + { + exec "cat $fn | $cmd ; rm -f $fn"; # calls sh -c .. + } + exit; + } + + + # =========================================================================== + # init(): load mailcap files and default mailcap + # + # &init (no arguments) + # + sub init + { + return if (%gl_mailcap); + local($_); + for (split(/:/,$gl_mailcap_path)) + { + next unless(-f $_); + &load_mailcap_file($_); + } + &load_mailcap_default; + } + + + # =========================================================================== + # load_mailcap_default(): load defaults from global var $gl_default_mailcap + # + # &load_mailcap_default (no arguments) + # + sub load_mailcap_default + { + return(-1) unless($gl_default_mailcap); + &parse_mailcap(split(/\n/,$gl_default_mailcap)); + return(0); + } + + + # =========================================================================== + # load_mailcap_file(): load mailcap data from external file $fn + # + # &load_mailcap_file($fn) + # + # WHERE, + # + # $fn: Name of mailcap file to load + # + sub load_mailcap_file + { + local($fn) = @_; + + open(IN,$fn) || return(-1); + &parse_mailcap(); + close(IN); + return(0); + } + + # =========================================================================== + # parse_mailcap(): parse mailcap lines and store in var $gl_default_mailcap + # + # &parse_mailcap(@mailcap_lines) + # + # WHERE, + # + # @mailcap_lines: One or more mailcap lines. Each element of @mailcap_lines + # is a complete entry and has any newlines and/or backslashes + # stripped out. + # + sub parse_mailcap + { + local($_,$type,$view,@types,$line,$key,$val,$types,$subtype); + + for (@_) + { + tr/\x00-\x1f\x7f-\xff//d; + @types = split(/\s*;\s*/); + $types = shift(@types); + ($type, $subtype) = split(/\//,$types,2); + $view = shift(@types); + for $line (@types) + { + $line =~ s/^\s+//; + $line =~ s/\s+$//; + ($key,$val) = split(/\s*=\s*/,$line,2); + $key =~ tr/A-Z/a-z/; + if (($key eq 'test') && ($val)) + { + local(@ret) = `sh -c '$val'`; + local($ret) = ($? >> 8); # value returned by exec'ing $val.. + # if test field returns a non-zero value, ignore the entry + next if ($ret); + } + } + if (($view) && (!$gl_mailcap{$type,$subtype,'view'})) + { + $gl_mailcap{$type,$subtype,'view'} = $view; + } + } + } + + # =========================================================================== + + 1; *** ../libwww-perl-0.20/wwwurl.pl Sat Jul 16 04:12:22 1994 --- wwwurl.pl Mon Aug 1 06:31:17 1994 *************** *** 1,4 **** ! # $Id: wwwurl.pl,v 0.13 1994/07/16 11:12:16 fielding Exp $ # --------------------------------------------------------------------------- # wwwurl: A package for parsing and manipulating World-Wide Web # Uniform Resource Locators (URL). --- 1,4 ---- ! # $Id: wwwurl.pl,v 0.14 1994/08/01 13:30:59 fielding Exp $ # --------------------------------------------------------------------------- # wwwurl: A package for parsing and manipulating World-Wide Web # Uniform Resource Locators (URL). *************** *** 16,21 **** --- 16,24 ---- # substitute (from Steven E. Brenner via Brooks Cutter). # Added escape() routine (w/mods) from Brooks Cutter. # 16 Jul 1994 (RTF): Added get_site() routine. + # 27 Jul 1994 (RTF): Firmed-up algorithm for parsing relative URLs, fixing + # several potential (but unlikely) bugs in the process. + # Removed any hint of "URL:" prefix. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 24,41 **** package wwwurl; %DefPort = ( # Define the default ports for major net services ! 'ftp', 21, ! 'file', 0, # note: non-local file URLs are changed to ftp URLs ! 'telnet', 23, ! 'whois', 43, ! 'gopher', 70, ! 'finger', 79, ! 'http', 80, ! 'nntp', 119, ! 'news', 119, ! 'prospero', 191, ! 'wais', 210, ! 'webster', 765, ); %CantChange = ( # Define schemes that cannot be altered by absolute --- 27,44 ---- package wwwurl; %DefPort = ( # Define the default ports for major net services ! 'ftp', 21, ! 'file', 0, # note: non-local file URLs are changed to ftp URLs ! 'telnet', 23, ! 'whois', 43, ! 'gopher', 70, ! 'finger', 79, ! 'http', 80, ! 'nntp', 119, ! 'news', 119, ! 'wais', 210, ! 'webster', 765, ! 'prospero', 1525, # I thought it was 191, but IETF differs ); %CantChange = ( # Define schemes that cannot be altered by absolute *************** *** 46,52 **** ); %NonHierarchical = ( # Define remaining schemes that can be changed ! 'telnet', 1, # but which cannot use relative URLs 'rlogin', 1, 'tn3270', 1, 'whois', 1, --- 49,55 ---- ); %NonHierarchical = ( # Define remaining schemes that can be changed ! 'telnet', 1, # but which cannot use relative URL paths 'rlogin', 1, 'tn3270', 1, 'whois', 1, *************** *** 54,65 **** 'finger', 1, ); # =========================================================================== # parse(): Parse the given URL into its component parts according to # WWW URI rules, returning '' for those that are not present. ! # If the scheme is not present, the URL will be parsed according ! # to HTTP rules. # # Returns the folowing in order: # --- 57,78 ---- 'finger', 1, ); + %UsesQuery = ( # Define schemes that use '?' to denote a query + 'http', 1, + 'wais', 1, + ); + %UsesParams = ( # Define schemes that use ';' to denote parameters + 'ftp', 1, + 'prospero', 1, + ); + + # =========================================================================== # parse(): Parse the given URL into its component parts according to # WWW URI rules, returning '' for those that are not present. ! # If no scheme is given, the URL is parsed according to HTTP rules, ! # so schemes which use different rules may have to recombine parts. # # Returns the folowing in order: # *************** *** 67,74 **** # $address: The login or hostname/IP address (if appropriate); # $port : The TCP port (if appropriate); # $path : The object path; ! # $query : The post-? search info (http and wais only) ! # $frag : The post-'#' fragment identifier # sub parse { --- 80,87 ---- # $address: The login or hostname/IP address (if appropriate); # $port : The TCP port (if appropriate); # $path : The object path; ! # $query : The post-'?' search info (only if scheme uses queries); ! # $frag : The post-'#' fragment identifier. # sub parse { *************** *** 80,86 **** local($query) = ''; local($frag) = ''; ! if ($url =~ s#^([+\-\w]+):##) { $scheme = $1; $scheme =~ tr/A-Z/a-z/; --- 93,99 ---- local($query) = ''; local($frag) = ''; ! if ($url =~ s#^([.+\-\w]+):##) { $scheme = $1; $scheme =~ tr/A-Z/a-z/; *************** *** 88,98 **** if ($url =~ s/#([^#]*)$//) { $frag = $1; } - if (!$scheme || ($scheme eq 'http') || ($scheme eq 'wais')) - { - if ($url =~ s/\?([^?]*)$//) { $query = $1; } - } - if ($url =~ m#^//#o) { $url =~ s#^//([^/]*)##; --- 101,106 ---- *************** *** 102,107 **** --- 110,121 ---- $port = $1; } } + + if (!$scheme || $UsesQuery{$scheme}) + { + if ($url =~ s/\?([^?]*)$//) { $query = $1; } + } + $path = $url; return ($scheme, $address, $port, $path, $query, $frag); *************** *** 133,141 **** if ($port) { $url .= ":$port"; } } ! if ($path) { $url .= $path; } ! if ($query) { $url .= "?$query"; } ! if ($frag) { $url .= "#$frag"; } return $url; } --- 147,159 ---- if ($port) { $url .= ":$port"; } } ! if ($path) { $url .= $path; } ! if ($query) ! { ! if (!$path) { $url .= '/'; } # Avoid mistaking query as being ! $url .= "?$query"; # part of the address ! } ! if ($frag) { $url .= "#$frag"; } return $url; } *************** *** 149,155 **** { local($url) = @_; ! $url =~ s/%(..)/pack("C",hex($1))/ge; return $url; } --- 167,173 ---- { local($url) = @_; ! $url =~ s/%([\dA-F][\dA-F])/pack("C",hex($1))/ge; return $url; } *************** *** 179,185 **** { local($parent, $url) = @_; ! $url =~ s/^url://io; # Just in case someone follows dumb IETF format local($scheme, $addr, $port, $path, $query, $frag) = &parse($url); --- 197,204 ---- { local($parent, $url) = @_; ! $url =~ s/^\s+//; # Remove any preceding whitespace ! $url =~ s/\s.*//; # Remove anything after first word local($scheme, $addr, $port, $path, $query, $frag) = &parse($url); *************** *** 195,229 **** local($psch,$paddr,$pport,$ppath,$pquery,$pfrag) = &parse($parent); ! if (!$scheme) { $scheme = $psch; } ! else { last RELATED if ($scheme ne $psch); } ! ! if (!$pport) { $pport = ($DefPort{$psch} || ''); } ! ! if ($addr || $port) { ! if (!$port) { $port = ($DefPort{$scheme} || ''); } ! last RELATED if ($addr ne $paddr); ! last RELATED if ($port ne $pport); } ! else ! { ! $addr = $paddr; ! $port = $pport; ! } ! last RELATED if ($NonHierarchical{$scheme}); if (!$path) { $path = $ppath; ! last RELATED if ($query); ! $query = $pquery; } ! elsif ($path !~ m|^/|o) # The order in which we resolve the ! { # relative components is important ! $ppath =~ s#/[^/]*$#/#; ! $path = $ppath . $path; while ($path =~ s#/\./#/#) {;} $path =~ s#/\.$#/#; while ($path =~ s#/[^/]+/\.\./#/#) {;} --- 214,259 ---- local($psch,$paddr,$pport,$ppath,$pquery,$pfrag) = &parse($parent); ! if (!$scheme) { ! $scheme = $psch; ! if ($query && !$UsesQuery{$scheme}) # Restore mistaken queries ! { ! $path .= '?'. $query; ! $query = ''; ! } } ! else { last RELATED if ($scheme ne $psch); } ! last RELATED if ($addr || $port); # Child must have used '//' + $addr = $paddr; + $port = $pport; + if (!$path) { $path = $ppath; ! if (!$query) { $query = $pquery; } } ! elsif ($NonHierarchical{$scheme}) ! {;} # Do nothing ! elsif ($path !~ m|^/|o) # If the child URL does not begin with '/' ! { ! if ($ppath) ! { ! if ($UsesParams{$scheme}) ! { ! $ppath =~ s#;.*##; # Trim off any parent parameters ! } ! $ppath =~ s#/[^/]*$#/#; # Trim off any parent filename ! } ! else { $ppath = '/'; } ! ! $path = $ppath . $path; ! # ! # The order in which we remove the relative "/." and "xxx/.." ! # path segment components is extremely important. ! # while ($path =~ s#/\./#/#) {;} $path =~ s#/\.$#/#; while ($path =~ s#/[^/]+/\.\./#/#) {;} *************** *** 235,240 **** --- 265,274 ---- { if (!$addr) { $addr = 'localhost'; } elsif ($addr ne 'localhost') { $scheme = 'ftp'; } + # + # The above line will have to be deleted once people stop using + # file: as an alias for ftp: (i.e. when the IETF standard is done). + # } elsif ($scheme eq 'http') { $path =~ s#^/\%7E#/~#; }