*** ../libwww-perl-0.11/Changes.txt Fri Jun 17 18:37:39 1994 --- Changes.txt Fri Jul 8 01:16:17 1994 *************** *** 1,5 **** --- 1,6 ---- Changes to libwww-perl ====================== + # $Id: Changes.txt,v 0.12 1994/07/08 08:08:14 fielding Exp $ See the files README.html and Artistic.txt for licensing and distribution info. See the file INSTALL.txt for installation information. *************** *** 17,22 **** --- 18,46 ---- A real HTML (or SGML) parser. + Version 0.12 July 8, 1994 + Placed everything under RCS version control and included repository. + Added www'stat (from Brooks Cutter) for doing stat-like calls on a URL. + Added message field to wwwerror'onrequest so that error-specific + messages (e.g. $@ and $!) can be included in the canned HTML output. + Added symbolic names for all response code numbers. + Reassigned 000 Timed Out error to response code 603. + Added 602 Connection Failed response code. + Vastly improved the error-handling for wwwhttp'request(). + Now escapes the URL entries generated by wwwfile'dirlist(). + Removed buggy attempt to delete comments at start of wwwhtml'extract_links. + Updated META parsing in wwwhtml to reflect HTML 2.0 proposed spec. + Moved require of sys/socket.ph outside of wwwhttp package declaration + due to a bug in perl4 found by Martijn Koster. + Added many checks to be sure environment variables are defined before + trying to use them in wwwmime, wwwurl, get, and testdates (Martijn Koster). + Fixed bug that occurred when parsing URLs with an empty path. + Replaced complicated wwwurl'unescape loop with a simple substitute + (from Steven E. Brenner via Brooks Cutter). + Added wwwurl'escape() to %hex escape URL segments (from Brooks Cutter). + Added testescapes program for testing wwwurl'escape and unescape. + + Version 0.11 June 17, 1994 Changed environment variable LIBWWW-PERL to LIBWWW_PERL because some systems can't handle the dash (Charlie Stross). *************** *** 23,28 **** --- 47,53 ---- Fixed bug in "get" that caused full pathname to be used as the method (Martijn Koster). Fixed handling of perverse relative URLs (e.g. ../../) in wwwurl'absolute. + Version 0.10 June 13, 1994 First public version. libwww-perl was developed by Roy Fielding *** ../libwww-perl-0.11/INSTALL.txt Fri Jun 17 19:22:14 1994 --- INSTALL.txt Fri Jul 8 01:17:01 1994 *************** *** 1,5 **** --- 1,6 ---- libwww-perl Installation Information ==================================== + # $Id: INSTALL.txt,v 0.12 1994/07/08 08:08:14 fielding Exp $ See the files README.html and Artistic.txt for licensing and distribution info. See the file Changes.txt for a complete list of changes and version history. *************** *** 20,37 **** it will be in the form of a compressed unix tar file. If it has not already been decompressed by your WWW client, then do one of: ! % uncompress libwww-perl-0.11.tar.Z ! % gunzip libwww-perl-0.11.tar.gz depending on which compressed version you downloaded. ! 2. Move the resulting libwww-perl-0.11.tar file to the directory above where you want to install libwww-perl, cd to that directory, and do ! % tar xvf libwww-perl-0.11.tar ! to create the directory ./libwww-perl-0.11 containing the following: Artistic.txt -- the Artistic License governing redistribution of the libwww-perl package. --- 21,38 ---- it will be in the form of a compressed unix tar file. If it has not already been decompressed by your WWW client, then do one of: ! % uncompress libwww-perl-0.12.tar.Z ! % gunzip libwww-perl-0.12.tar.gz depending on which compressed version you downloaded. ! 2. Move the resulting libwww-perl-0.12.tar file to the directory above where you want to install libwww-perl, cd to that directory, and do ! % tar xvf libwww-perl-0.12.tar ! to create the directory ./libwww-perl-0.12 containing the following: Artistic.txt -- the Artistic License governing redistribution of the libwww-perl package. *************** *** 48,53 **** --- 49,55 ---- extensions in the same format as that used by NCSA httpd_1.3 and many WWW clients. testdates -- a simple program for testing the wwwdates.pl package. + testescapes -- a program for testing wwwurl'escape and unescape. www.pl -- the primary entry point for WWW requests -- give it any absolute URL and a request method and it will try to perform the method using the URL's protocol *************** *** 75,82 **** 3. You may need to change the following (with any text editor). ! The first line of each program (get and testdates) should point to ! your perl executable: #!/usr/local/bin/perl --- 77,84 ---- 3. You may need to change the following (with any text editor). ! The first line of each program (get, testdates, and testescapes) ! should point to your perl executable: #!/usr/local/bin/perl *************** *** 84,102 **** 4. The LIBWWW_PERL environment variable must be set to point to the libwww-perl directory, e.g. ! % setenv LIBWWW_PERL /usr/local/lib/libwww-perl-0.11 This allows clients like "get" to place the libwww-perl on their @INC path and also allows wwwmime.pl to find the standard mime.types file. ! 5. Make sure the scripts are executable: ! % chmod 755 get testdates ! % ln -s get head ! 6. That's it. You should now be able to run get, head and testdates, e.g. % get http://www.ics.uci.edu/ --- 86,105 ---- 4. The LIBWWW_PERL environment variable must be set to point to the libwww-perl directory, e.g. ! % setenv LIBWWW_PERL /usr/local/lib/libwww-perl-0.12 This allows clients like "get" to place the libwww-perl on their @INC path and also allows wwwmime.pl to find the standard mime.types file. ! 5. Make sure the programs are executable: ! % chmod 755 get testdates testescapes ! % ln -s get HEAD + HEAD is in uppercase only because there already exists a unix head command. ! 6. That's it. You should now be able to run get and HEAD, e.g. % get http://www.ics.uci.edu/ *************** *** 127,133 **** Just send them (preferably as a context diff suitable for reading by the patch program) to Roy Fielding at . ! You are also free to send them to others by mail or news (or even disk), just as long as you don't claim they are part of the "standard distribution" of libwww-perl. --- 130,138 ---- Just send them (preferably as a context diff suitable for reading by the patch program) to Roy Fielding at . ! If you have RCS (or CVS), you can use the included RCS repository ! to keep track of your changes and merge them with later distributions. ! You are also free to send changes to others by mail or news (or even disk), just as long as you don't claim they are part of the "standard distribution" of libwww-perl. *** ../libwww-perl-0.11/README.html Fri Jun 17 19:15:31 1994 --- README.html Fri Jul 8 01:18:45 1994 *************** *** 1,4 **** ! libwww-perl: Distribution Information --- 1,6 ---- ! ! ! libwww-perl: Distribution Information *************** *** 61,81 **** The following developers have contributed (either directly or indirectly) to the libwww-perl distribution:
Roy ! Fielding, University of California, Irvine, USA
Architect and primary developer of the library. !
Martijn Koster, NEXOR Ltd., UK !
Contributed bug fixes and part of Oscar's http. !
Oscar Nierstrasz, Universitaet Bern, Switzerland
Oscar's collection of useful perl scripts formed the basis on which the wwwhttp.pl and wwwhtml.pl packages were built. !
Gertjan van Oosten
Contributed code for parsing WWW date formats (used in wwwdates.pl)
Gene Spafford, Purdue University, USA
Contributed the MailStuff package for parsing rfc822 headers.
Others
These people contributed to prior packages which influenced the ! development of libwww-perl: Marion Hakanson (ctime), Marc van Heyningen (http), Waldemar Kebsch (ctime), and Larry Wall (Perl).
--- 63,88 ---- The following developers have contributed (either directly or indirectly) to the libwww-perl distribution:
+
Brooks Cutter, AT&T Paradyne Corporation, USA +
Contributed code for escaping and unescaping URLs, www'stat(), + and several bug fixes.
Roy ! Fielding, University of California, Irvine, USA
Architect and primary developer of the library. !
Martijn Koster, ! NEXOR Ltd., UK !
Contributed many bug fixes and part of Oscar's http. !
Oscar ! Nierstrasz, Universitaet Bern, Switzerland
Oscar's collection of useful perl scripts formed the basis on which the wwwhttp.pl and wwwhtml.pl packages were built. !
Gertjan van Oosten, West Consulting bv, NL
Contributed code for parsing WWW date formats (used in wwwdates.pl)
Gene Spafford, Purdue University, USA
Contributed the MailStuff package for parsing rfc822 headers.
Others
These people contributed to prior packages which influenced the ! development of libwww-perl: Steven E. Brenner (cgi-lib), Marion Hakanson (ctime), Marc van Heyningen (http), Waldemar Kebsch (ctime), and Larry Wall (Perl).
*************** *** 84,92 ****

The Distribution

For easy distribution, libwww-perl is available as a ! gzip'd tar file or as a ! compress'd tar file. It is also available via anonymous ftp from liege.ics.uci.edu in the directory --- 91,99 ----

The Distribution

For easy distribution, libwww-perl is available as a ! gzip'd tar file or as a ! compress'd tar file. It is also available via anonymous ftp from liege.ics.uci.edu in the directory *************** *** 122,127 **** --- 129,137 ----
testdates
a simple program for testing the wwwdates.pl package. +
testescapes +
a simple program for testing the wwwurl'escape and unescape routines. +
www.pl
the primary entry point for WWW requests -- give it any absolute URL and a request method and it will try to perform the method using *************** *** 161,201 ****
!

Version History

-
-
Known problems -
Documentation of the library architecture is sorely lacking, - although the code itself is fairly easy to read and understand.

- -

Things that need to be done - (let us know if you are working on something good) -
Interfaces to FTP, Gopher, WAIS, ...
- A real HTML (or SGML) parser.

- -

Version 0.11 --- June 17, 1994 - [patch] -
Changed environment variable LIBWWW-PERL to LIBWWW_PERL because - some systems can't handle the dash (Charlie Stross).
- Fixed bug in "get" that caused full pathname to be used as the method - (Martijn Koster).
- Fixed handling of perverse relative URLs (e.g. ../../) in wwwurl'absolute. -

- -

Version 0.10 --- June 13, 1994 -
First public version. libwww-perl was developed by - Roy Fielding from the core of - MOMspider, - a program intended to assist multi-owner maintenance of distributed - hypertext infostructures. It was expanded to a general-purpose library - after some encouragement from Oscar Nierstrasz and Martijn Koster during - the First International Conference on the World-Wide Web (WWW94).

- -

- - See the file Changes.txt for more version - information.

- If you have any suggestions, bug reports, fixes, or enhancements, send them to Roy Fielding at (fielding@ics.uci.edu). Also, we would like to ask --- 171,178 ----


!

Version History

If you have any suggestions, bug reports, fixes, or enhancements, send them to Roy Fielding at (fielding@ics.uci.edu). Also, we would like to ask *************** *** 230,234 **** Department of Information and Computer Science,
University of California, Irvine, CA 92717-3425 ! Last modified: Mon Jun 13 22:44:29 1994 --- 207,211 ---- Department of Information and Computer Science,
University of California, Irvine, CA 92717-3425 ! Last modified: Fri Jul 8 07:48:21 GMT 1994 *** ../libwww-perl-0.11/www.pl Fri Jun 17 18:34:21 1994 --- www.pl Fri Jul 8 01:09:21 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: www.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # www.pl: A package for handling requests of any World-Wide Web URL, # including requests that should be redirected to a proxy server. *************** *** 5,10 **** --- 6,13 ---- # # This package has been developed by Roy Fielding # as part of the Arcadia project at the University of California, Irvine. + # Each routine in this package has been derived from the work of multiple + # authors -- those that are known are listed above the respective routines. # It is distributed under the Artistic License (included with your Perl # distribution files and with the standard distribution of this package). # *************** *** 13,18 **** --- 16,23 ---- # # # 13 Jun 1994 (RTF): Initial version + # 07 Jul 1994 (RTF): Added stat() code from Brooks Cutter. + # Updated error messages. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 27,33 **** # and a "request" subroutine. package www; ! $Library = 'libwww-perl/0.10'; # To be appended onto client's User-Agent # (this must be done within the client code). if ($NoProxy = $ENV{'no_proxy'}) { @DontProxy = split(/,/, $NoProxy); } --- 32,38 ---- # and a "request" subroutine. package www; ! $Library = 'libwww-perl/0.12'; # To be appended onto client's User-Agent # (this must be done within the client code). if ($NoProxy = $ENV{'no_proxy'}) { @DontProxy = split(/,/, $NoProxy); } *************** *** 76,83 **** if (!$scheme) { ! return &wwwerror'onrequest(600, $method, $scheme, $host, $port, ! $object, *headers, *content); } if ($proxy = &lookup_proxy($scheme, $host, $port)) --- 81,89 ---- if (!$scheme) { ! return &wwwerror'onrequest($wwwerror'RC_bad_request_client, $method, ! $scheme, $host, $port, $object, *headers, *content, ! "URL requested does not have an access scheme"); } if ($proxy = &lookup_proxy($scheme, $host, $port)) *************** *** 91,98 **** if (!((eval "defined($routine);") && (eval "$allowed;")) ) { ! return &wwwerror'onrequest(601, $method, $scheme, $host, $port, ! $object, *headers, *content); } if (!$port) { $port = $wwwurl'DefPort{$scheme} }; --- 97,105 ---- if (!((eval "defined($routine);") && (eval "$allowed;")) ) { ! return &wwwerror'onrequest($wwwerror'RC_not_implemented_client, ! $method, $scheme, $host, $port, $object, *headers, *content, ! "Request method not supported by client library"); } if (!$port) { $port = $wwwurl'DefPort{$scheme} }; *************** *** 136,141 **** --- 143,231 ---- local($pcheck) = join(//, q/$ENV{'/, $scheme, q/_proxy'}/); return (eval "$pcheck;"); + } + + + # ====================================================================== + # stat(): Return the status of the passed-in URL. + # Submitted by Brooks Cutter 05 Jul 1994 + # + # ($response, + # $last_modified, + # $content_length, + # $content_type, + # $content_transfer_encoding, + # $content_encoding, + # $content_language, + # $expires, + # $message_id) = &www'stat($url, $user_agent, $reply_to); + # + # WHERE, + # + # Values returned by &www'stat(): + # $response: HTTP numeric response code (see wwwerror.pl for a list) + # $last_modified: Date last modified in Unix time_t (long) format. + # $content_length: The length of the $url + # $content_type: The type of the document + # $content_transfer_encoding: As in MIME. + # $content_encoding: if any, then x-compress or x-gzip + # $content_language: ISO 3316 language code like 'en' for english + # $expires: If specified, After $expires date retrieved document is invalid + # $message_id: (globally) Unique identifier for object + # + # Note: + # - Values (if any) in the above fields depend on the remote + # server and document requested. + # - Additional return values may be added at a later time. + # + # Values passed to &www'stat(): + # + # $url: Fully qualified http: or file: URL + # $user_agent: String that includes the name of program calling &www'stat() + # $reply_to: Your fully-qualified internet Email address + # + # ---------------------------------------------------------------------- + # Example: retrieve Last modified and size of What's New with NCSA Mosaic + # + # $url = 'http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html'; + # chop($domain = `hostname`); + # if (index($domain,'.')==-1) { $domain .= '.' . `domainname`; chop($domain); } + # $ReplyTo = "$ENV{'USER'}@$domain"; + # $UserAgent = "myprog/0.1 $www'Library"; + # ($rcode,$lastmod,$size) = (&www'stat($url,$UserAgent,$ReplyTo))[0,1,2]; + # + # For more information on the returned headers, see the Hypertext Transfer + # Protocol specification, section "The Response/Response Headers" at the URL + # http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html + # + sub stat + { + local($url, $user_agent, $reply_to) = @_; + local(%headers, $content, $response, $last_modified); + + # Define the HTTP request headers + + if ($user_agent) { $headers{'User-Agent'} = $user_agent; } + if ($reply_to) { $headers{'From'} = $reply_to; } + + $response = &request('HEAD', $url, *headers, *content, 30); + + if ($headers{'last-modified'}) + { + $last_modified = &wwwdates'get_gmtime($headers{'last-modified'}) + } + + return( + $response, + $last_modified, + $headers{'content-length'}, + $headers{'content-type'}, + $headers{'content-transfer-encoding'}, + $headers{'content-encoding'}, + $headers{'content-language'}, + $headers{'expires'}, + $headers{'message-id'}, + ); } *** ../libwww-perl-0.11/wwwdates.pl Fri Jun 17 18:34:34 1994 --- wwwdates.pl Fri Jul 8 01:10:06 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwdates.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwdates: A package for manipulating date/time stamps in the format used # on the World-Wide Web. *** ../libwww-perl-0.11/wwwerror.pl Fri Jun 17 18:34:45 1994 --- wwwerror.pl Fri Jul 8 01:11:03 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwerror.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwerror.pl: A package for handling World-Wide Web client errors as if # they were being returned from a proxy server. *************** *** 7,13 **** # It is distributed under the Artistic License (included with your Perl # distribution files). # ! # 13 Jun 1994 (RTF): Initial version # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . --- 8,20 ---- # It is distributed under the Artistic License (included with your Perl # distribution files). # ! # 13 Jun 1994 (RTF): Initial version ! # 07 Jul 1994 (RTF): Added symbolic names for each error code so that clients ! # can protect themselves from changing error numbers. ! # 000 Timed Out error was reassigned to response code 603. ! # 602 Connection Failed response code was added. ! # A $msg parameter was added to &onrequest so that callers ! # can include a text message (e.g. $@) in canned error. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 17,25 **** package wwwerror; ! %RespMessage = ( # Define all response messages for use by callers ! 000, 'Timed Out', 200, 'OK', 201, 'CREATED', 202, 'Accepted', --- 24,56 ---- package wwwerror; + $RC_unknown = 000; # Define symbolic names for response codes + $RC_ok = 200; + $RC_created = 201; + $RC_accepted = 202; + $RC_partial = 203; + $RC_no_response = 204; + $RC_moved = 301; + $RC_found = 302; + $RC_method = 303; + $RC_not_modified = 304; + $RC_bad_request = 400; + $RC_unauthorized = 401; + $RC_payment_required = 402; + $RC_forbidden = 403; + $RC_not_found = 404; + $RC_internal_error = 500; + $RC_not_implemented = 501; + $RC_bad_response = 502; + $RC_too_busy = 503; + $RC_bad_request_client = 600; + $RC_not_implemented_client = 601; + $RC_connection_failed = 602; + $RC_timed_out = 603; ! ! %RespMessage = ( # Define all response messages for use by callers ! 000, 'Unknown Error', 200, 'OK', 201, 'CREATED', 202, 'Accepted', *************** *** 40,51 **** 503, 'Too Busy', 600, 'Bad Request in Client', 601, 'Not Implemented in Client', ); # =========================================================================== # onrequest(): Handle error responses to a WWW request that never made it ! # outside the client. The error code and request method and ! # parsed URL (scheme,host,port,object) are passed in. # # Returns the response code along with the appropriately faked # headers and content as parameters. --- 71,84 ---- 503, 'Too Busy', 600, 'Bad Request in Client', 601, 'Not Implemented in Client', + 602, 'Connection Failed', + 603, 'Timed Out', ); # =========================================================================== # onrequest(): Handle error responses to a WWW request that never made it ! # outside the client. The error code, request method, parsed URL ! # (scheme,host,port,object) and optional message are passed in. # # Returns the response code along with the appropriately faked # headers and content as parameters. *************** *** 53,64 **** sub onrequest { local($respcode, $method, $scheme, $host, $port, $object, ! *headers, *content) = @_; local($url, $title); $url = &wwwurl'compose($scheme, $host, $port, $object,'',''); $title = "$respcode $RespMessage{$respcode}"; $content = <<"EOF"; $title

$title

--- 86,99 ---- sub onrequest { local($respcode, $method, $scheme, $host, $port, $object, ! *headers, *content, $msg) = @_; local($url, $title); $url = &wwwurl'compose($scheme, $host, $port, $object,'',''); $title = "$respcode $RespMessage{$respcode}"; + if (!defined($msg)) { $msg = ''; } + $content = <<"EOF"; $title

$title

*************** *** 65,70 **** --- 100,107 ---- The following request could not be satisfied by this client:
  $method $url
+ 
+ $msg
  
EOF *** ../libwww-perl-0.11/wwwfile.pl Fri Jun 17 18:34:58 1994 --- wwwfile.pl Fri Jul 8 01:11:33 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwfile.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwfile: A package for interpreting local FILE requests and returning # responses as if they came from a remote server via HTTP proxy. *************** *** 11,16 **** --- 12,18 ---- # distribution files). # # 13 Jun 1994 (RTF): Initial version + # 07 Jul 1994 (RTF): Updated the error messages and escaped generated URLs. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 46,53 **** if (!($AllowedMethods{$method} && ($host =~ m/^localhost$/io))) { ! return &wwwerror'onrequest(600, $method, 'file', $host, $port, ! $object, *headers, *content); } $pathname = &wwwurl'unescape($object); --- 48,56 ---- if (!($AllowedMethods{$method} && ($host =~ m/^localhost$/io))) { ! return &wwwerror'onrequest($wwwerror'RC_bad_request_client, $method, ! 'file', $host, $port, $object, *headers, *content, ! "Library does not allow that method for file"); } $pathname = &wwwurl'unescape($object); *************** *** 54,66 **** if (!(-e $pathname)) # If the file does not exist, say 404 Not Found { ! return &wwwerror'onrequest(404, $method, 'file', $host, $port, ! $object, *headers, *content); } if (!(-r _)) # If we don't have read permission, say 403 Forbidden { ! return &wwwerror'onrequest(403, $method, 'file', $host, $port, ! $object, *headers, *content); } local($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, --- 57,71 ---- if (!(-e $pathname)) # If the file does not exist, say 404 Not Found { ! return &wwwerror'onrequest($wwwerror'RC_not_found, $method, 'file', ! $host, $port, $object, *headers, *content, ! "File does not exist"); } if (!(-r _)) # If we don't have read permission, say 403 Forbidden { ! return &wwwerror'onrequest($wwwerror'RC_forbidden, $method, 'file', ! $host, $port, $object, *headers, *content, ! "User does not have read permission"); } local($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, *************** *** 80,87 **** { if (!open(FS, $pathname)) { ! return &wwwerror'onrequest(500, $method, 'file', $host, $port, ! $object, *headers, *content); } local($/); undef($/); --- 85,93 ---- { if (!open(FS, $pathname)) { ! return &wwwerror'onrequest($RC_internal_error, $method, ! 'file', $host, $port, $object, *headers, *content, ! "Open failed: $!"); } local($/); undef($/); *************** *** 95,101 **** } &wwwmime'fakehead($tail, $size, $mtime, *headers); ! return 200; } --- 101,107 ---- } &wwwmime'fakehead($tail, $size, $mtime, *headers); ! return $wwwerror'RC_ok; } *************** *** 108,120 **** local($pathname) = @_; local($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks); ! local($wstr, $parent, $file, $full); if ($pathname !~ m#/$#) { $pathname .= '/'; } $wstr = <<"EOF"; ! Local Directory $pathname !

Local Directory $pathname

    EOF --- 114,131 ---- local($pathname) = @_; local($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks); ! local($wstr, $parent, $htmlname, $pathesc, $file, $full); if ($pathname !~ m#/$#) { $pathname .= '/'; } + $htmlname = $pathname; + $htmlname =~ s/\&/\&/g; + $htmlname =~ s/\/\>/g; + $wstr = <<"EOF"; ! Local Directory $htmlname !

    Local Directory $htmlname

      EOF *************** *** 123,132 **** --- 134,146 ---- return $wstr . "ERROR: Failed to open the directory
    \n"; } + $pathesc = '[\x00-\x20"#%;<&>?\x7F-\xFF]'; # Everything bad except '/' + if ($pathname ne '/') { $parent = $pathname; $parent =~ s#/[^/]+/$#/#; + $parent = &wwwurl'escape($parent, $pathesc); $wstr .= "
  • Parent Directory\n"; } *************** *** 135,144 **** { next if (($file eq '.') || ($file eq '..')); ! $full= $pathname . $file; ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($full); if (-d _) { $file .= '/'; $full .= '/'; } --- 149,163 ---- { next if (($file eq '.') || ($file eq '..')); ! $full = $pathname . $file; ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($full); + + $full = &wwwurl'escape($full, $pathesc); + $file =~ s/\&/\&/g; + $file =~ s/\/\>/g; if (-d _) { $file .= '/'; $full .= '/'; } *** ../libwww-perl-0.11/wwwhtml.pl Fri Jun 17 18:35:09 1994 --- wwwhtml.pl Fri Jul 8 01:12:04 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwhtml.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwhtml: A package for parsing pages of HyperText Markup Language (HTML) # for a World-Wide Web spider. *************** *** 11,16 **** --- 12,19 ---- # distribution files). # # 13 Jun 1994 (RTF): Initial version + # 07 Jul 1994 (RTF): Removed buggy attempt to extract comments. + # Updated META parsing to reflect HTML 2.0 proposal. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 56,62 **** local($link, $orig, $elem); $content =~ s/\s+/ /g; # Remove all extra whitespace and newlines - $content =~ s///g; # Remove all SGML comments (I hope) $content =~ s#]*>([^<]+)]*>##i; # Extract the title if ($1) { $headers{'title'} = $1; } --- 59,64 ---- *************** *** 66,73 **** $content =~ s/>[^<]+$/>/; # Remove everything after last element # Isolate all META elements as text ! $content =~ s/]*name\s*=\s*"?([^">]+)[^>]*value\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; ! $content =~ s/]*value\s*=\s*"?([^">]+)[^>]*name\s*=\s*"?([^">]+)[^>]*>/M $2 $1\n/gi; # Isolate all A element HREFs as text $content =~ s/]*href\s*=\s*"?([^">]+)[^>]*>/A $1\n/gi; # Isolate all IMG element SRCs as text --- 68,75 ---- $content =~ s/>[^<]+$/>/; # Remove everything after last element # Isolate all META elements as text ! $content =~ s/]*http-equiv\s*=\s*"?([^">]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; ! $content =~ s/]*name\s*=\s*"?([^">]+)[^>]*content\s*=\s*"?([^">]+)[^>]*>/M $1 $2\n/gi; # Isolate all A element HREFs as text $content =~ s/]*href\s*=\s*"?([^">]+)[^>]*>/A $1\n/gi; # Isolate all IMG element SRCs as text *** ../libwww-perl-0.11/wwwhttp.pl Fri Jun 17 18:35:22 1994 --- wwwhttp.pl Fri Jul 8 01:12:43 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwhttp.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwhttp: A package for sending HTTP requests and handling responses for the # World-Wide Web. This package is designed for use by www.pl *************** *** 12,17 **** --- 13,21 ---- # distribution files). # # 13 Jun 1994 (RTF): Initial version + # 07 Jul 1994 (RTF): Moved require of sys/socket.ph outside of package due to + # a bug in perl4 found by Martijn Koster + # Fixed error handling in case of problems in eval. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 21,26 **** --- 25,31 ---- # See for more info. # =========================================================================== require "wwwerror.pl"; + require "sys/socket.ph"; package wwwhttp; *************** *** 40,52 **** # # Setup the socket parameters for this process # - require "sys/socket.ph"; $SockAddr = 'S n a4 x8'; chop($ThisHost = `hostname`); ($name, $aliases, $Proto) = getprotobyname("tcp"); ($name, $aliases, $addrtype, $len, $ThisAddr) = gethostbyname($ThisHost); ! $ThisSock = pack($SockAddr, &AF_INET, 0, $ThisAddr); # =========================================================================== --- 45,57 ---- # # Setup the socket parameters for this process # $SockAddr = 'S n a4 x8'; chop($ThisHost = `hostname`); + if (!$ThisHost) { die "Can't get hostname of this host, stopped"; } ($name, $aliases, $Proto) = getprotobyname("tcp"); ($name, $aliases, $addrtype, $len, $ThisAddr) = gethostbyname($ThisHost); ! $ThisSock = pack($SockAddr, &main'AF_INET, 0, $ThisAddr); # =========================================================================== *************** *** 70,77 **** if (!$AllowedMethods{$method}) { ! return &wwwerror'onrequest(600, $method, 'http', $host, $port, ! $object, *headers, *content); } $reqstr = "$method $object HTTP/1.0\r\n"; --- 75,83 ---- if (!$AllowedMethods{$method}) { ! return &wwwerror'onrequest($wwwerror'RC_bad_request_client, $method, ! 'http', $host, $port, $object, *headers, *content, ! "Library does not allow that method for HTTP"); } $reqstr = "$method $object HTTP/1.0\r\n"; *************** *** 93,106 **** else { ($fqdn, $aliases, $addrtype, $len, $thataddr) = gethostbyname($host); } ! $that = pack($SockAddr, &AF_INET, $port, $thataddr); ! if (!( socket(FS, &AF_INET, &SOCK_STREAM, $Proto) && bind(FS, $ThisSock) )) { ! return &wwwerror'onrequest(502, $method, 'http', $host, $port, ! $object, *headers, *content); } local($/); --- 99,119 ---- else { ($fqdn, $aliases, $addrtype, $len, $thataddr) = gethostbyname($host); + if (!$fqdn) + { + return &wwwerror'onrequest($wwwerror'RC_connection_failed, $method, + 'http', $host, $port, $object, *headers, *content, + "Cannot find hostname $host"); + } } ! $that = pack($SockAddr, &main'AF_INET, $port, $thataddr); ! if (!( socket(FS, &main'AF_INET, &main'SOCK_STREAM, $Proto) && bind(FS, $ThisSock) )) { ! return &wwwerror'onrequest($wwwerror'RC_connection_failed, $method, ! 'http', $host, $port, $object, *headers, *content, ! "Failed bind to our local socket: $!"); } local($/); *************** *** 107,117 **** $run_it = <<'EOF'; $SIG{'ALRM'} = "wwwhttp'timeout"; alarm($timeout); ! if (!(connect(FS, $that))) ! { ! return &wwwerror'onrequest(502, $method, 'http', $host, $port, ! $object, *headers, *content); ! } select((select(FS), $| = 1)[0]); # Make FS unbuffered print FS $reqstr; if ($AllowedMethods{$method} == 2) { print FS $content; } --- 120,126 ---- $run_it = <<'EOF'; $SIG{'ALRM'} = "wwwhttp'timeout"; alarm($timeout); ! connect(FS, $that) || die "Cannot connect to $host:$port, $! \n"; select((select(FS), $| = 1)[0]); # Make FS unbuffered print FS $reqstr; if ($AllowedMethods{$method} == 2) { print FS $content; } *************** *** 131,137 **** } else # old style server reply { ! $response = 200; # I have no idea if it's good or not undef($/); $content = $_; $_ = ; --- 140,146 ---- } else # old style server reply { ! $response = $wwwerror'RC_ok; # I have no idea if it's good or not undef($/); $content = $_; $_ = ; *************** *** 140,150 **** $SIG{'ALRM'} = "IGNORE"; EOF ! if (eval $run_it) { ! &parseRFC822head($resphead, *headers); } close(FS); return $response; } --- 149,167 ---- $SIG{'ALRM'} = "IGNORE"; EOF ! eval $run_it; ! if ($@) { ! $SIG{'ALRM'} = "IGNORE"; ! close(FS); ! if ($@ =~ /^Time/o) { $response = $wwwerror'RC_timed_out; } ! else { $response = $wwwerror'RC_connection_failed; } ! ! return &wwwerror'onrequest($response, $method, 'http', $host, $port, ! $object, *headers, *content, $@); } close(FS); + &parseRFC822head($resphead, *headers); return $response; } *** ../libwww-perl-0.11/wwwmime.pl Tue Jun 14 06:33:46 1994 --- wwwmime.pl Fri Jul 8 01:13:20 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwmime.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwmime.pl: A package for handling MIME-specific operations for # a World-Wide Web client. *************** *** 9,14 **** --- 10,16 ---- # # 13 Jun 1994 (RTF): Initial version # 14 Jun 1994 (RTF): Changed environment variable to LIBWWW_PERL + # 07 Jul 1994 (RTF): Made calls to load_mimetypes more tolerant # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 29,36 **** 'z', 'x-compress', ); ! &load_mimetypes($ENV{'LIBWWW_PERL'} . "/mime.types"); ! &load_mimetypes($ENV{'HOME'} . "/.mime.types"); # =========================================================================== --- 31,41 ---- 'z', 'x-compress', ); ! $libloc = ($ENV{'LIBWWW_PERL'} || '.'); ! &load_mimetypes("$libloc/mime.types"); ! ! $myhome = ($ENV{'HOME'} || $ENV{'home'} || '.'); ! &load_mimetypes("$myhome/.mime.types"); # =========================================================================== *** ../libwww-perl-0.11/wwwurl.pl Fri Jun 17 18:31:59 1994 --- wwwurl.pl Fri Jul 8 01:14:01 1994 *************** *** 1,3 **** --- 1,4 ---- + # $Id: wwwurl.pl,v 0.12 1994/07/08 08:08:14 fielding Exp $ # --------------------------------------------------------------------------- # wwwurl: A package for parsing and manipulating World-Wide Web # Uniform Resource Locators (URL). *************** *** 9,14 **** --- 10,20 ---- # # 13 Jun 1994 (RTF): Initial version # 17 Jun 1994 (RTF): Fixed double-relative URL handling (e.g. ../../) + # 06 Jul 1994 (RTF): Fixed bug parsing URLs with an empty path and added + # fallback code for undefined associative array entries. + # Replaced complicated unescape loop with a simple + # substitute (from Steven E. Brenner via Brooks Cutter). + # Added escape() routine (w/mods) from Brooks Cutter. # # If you have any suggestions, bug reports, fixes, or enhancements, # send them to Roy Fielding at . *************** *** 57,63 **** # Returns the folowing in order: # # $scheme : The access scheme (converted to lower case); ! # $address: The hostname/IP address (if appropriate); # $port : The TCP port (if appropriate); # $path : The object path; # $query : The post-? search info (http and wais only) --- 63,69 ---- # Returns the folowing in order: # # $scheme : The access scheme (converted to lower case); ! # $address: The login or hostname/IP address (if appropriate); # $port : The TCP port (if appropriate); # $path : The object path; # $query : The post-? search info (http and wais only) *************** *** 73,79 **** local($query) = ''; local($frag) = ''; ! if ($url =~ s#^([\w-]+):##) { $scheme = $1; $scheme =~ tr/A-Z/a-z/; --- 79,85 ---- local($query) = ''; local($frag) = ''; ! if ($url =~ s#^([+\-\w]+):##) { $scheme = $1; $scheme =~ tr/A-Z/a-z/; *************** *** 88,94 **** if ($url =~ m#^//#o) { ! $url =~ s#^//([^/]*)/#/#; $address = $1; if ($address =~ s/:(\d+)$//) { --- 94,100 ---- if ($url =~ m#^//#o) { ! $url =~ s#^//([^/]*)##; $address = $1; if ($address =~ s/:(\d+)$//) { *************** *** 142,157 **** { local($url) = @_; ! local($pos) = $[; ! while (($pos = index($url,'%',$pos)) >= $[) ! { ! substr($url,$pos,3) = pack("C", hex(substr($url,($pos + 1),2))); ! $pos++; ! } return $url; } # =========================================================================== # absolute(): Return the absolute URL given a (possibly relative) URL # and its parent's absolute URL. --- 148,175 ---- { local($url) = @_; ! $url =~ s/%(..)/pack("C",hex($1))/ge; return $url; } + # =========================================================================== + # escape(): Return the passed string after replacing all characters matching + # the passed pattern with their %XX hex escape chars. Note that + # the caller must be sure not to escape reserved URL characters + # (e.g. / in pathnames, ':' between address and port, etc.) and thus + # this routine can only be applied to each URL part separately. E.g. + # + # $escname = &escape($name,'[\x00-\x20"#%/;<>?\x7F-\xFF]'); + # + sub escape + { + local($str, $pat) = @_; + + $str =~ s/($pat)/sprintf("%%%02lx",unpack('C',$1))/ge; + return($str); + } + # =========================================================================== # absolute(): Return the absolute URL given a (possibly relative) URL # and its parent's absolute URL. *************** *** 179,189 **** if (!$scheme) { $scheme = $psch; } else { last RELATED if ($scheme ne $psch); } ! if (!$pport) { $pport = $DefPort{$psch}; } if ($addr || $port) { ! if (!$port) { $port = $DefPort{$scheme}; } last RELATED if ($addr ne $paddr); last RELATED if ($port ne $pport); } --- 197,207 ---- if (!$scheme) { $scheme = $psch; } else { last RELATED if ($scheme ne $psch); } ! if (!$pport) { $pport = ($DefPort{$psch} || ''); } if ($addr || $port) { ! if (!$port) { $port = ($DefPort{$scheme} || ''); } last RELATED if ($addr ne $paddr); last RELATED if ($port ne $pport); } *************** *** 222,228 **** # NOTE: Fanatical spec-followers should reverse the above substitution # because it improperly prefers the tilde character over %7E (:-b) ! if ($port == $DefPort{$scheme}) { $port = ''; } return &compose($scheme, $addr, $port, $path, $query, $frag); } --- 240,246 ---- # NOTE: Fanatical spec-followers should reverse the above substitution # because it improperly prefers the tilde character over %7E (:-b) ! if ($port && $port == $DefPort{$scheme}) { $port = ''; } return &compose($scheme, $addr, $port, $path, $query, $frag); }