w3new v0.4: Creates a What's New list of http: URL's

Brooks Cutter (bcutter@palan.palantir.com)
Mon, 25 Jul 1994 13:26:58 -0400 (EDT)


Included below is information about my w3new package ... you'll find
the latest information at the URL

http://www.stuff.com/cgi-bin/bbcurn?user=bcutter&pkg=w3new

w3new is based on the libwww-perl package, and uses the libraries
www,wwwhttp,wwwurl,wwwdates and wwwbot.pl
--------------------------------------------------------------------

w3new v0.4
**********

Creates a What's New list of http: URL's
========================================

w3new is a program that will extract a list of URL's from either your mosaic
hotlist, or will extract the URL's from a HTML document. It will then retrieve
the HTTP/1.0 modification dates for each document listed, and output a HTML
file with the URL's sorted by their last modification time. 


Package: w3new 
Author: Brooks Cutter (bcutter@stuff.com) 
Latest version: 0.4 
Last updated: July 22, 1994 
Archive: w3new.tar.gz 
(Includes everything you need to run it, except perl v4.036) 


What it does: 

 1. w3new first extracts the URL's from your hotlist or a HTML document 
 2. w3new then proceeds to do a HTTP/1.0 HEAD on each http: URL and
   stores the Last modified time of each URL (if available). 
 3. Finally, w3new sorts it's output based on the last modified time of the
   document and categorizes them by the month they were modified. For
   non-http URL's or URL's for which it can't retrieve the Last modified
   time, it will list these at the bottom of it's HTML output 

This program was written because I found myself frequently checking web
pages in my hotlist to see if they had recently been changed. Now I run w3new
from cron nightly, and check it's output each morning. 

pointing w3new at a list of URL's
=================================

 1. Using your Mosaic hotlist by default, w3new will read your Mosaic
   hotlist file and extract the document URL's and document titles. It will
   look for your hotlist in your home directory as the file 
   ~/.mosaic-hotlist-default 

   this can be overriden by using the environment variable W3NEW_HOTLIST 

   The default and environment variable can be overriden with the
   command line arguments -hotlist_fn or -i with a argument of the file in
   Mosaic-hotlist format. 
 2. Extracting URL's from a HTML document if you call w3new with the 
   -u argument and a HTML document URL, it will retrieve that document,
   extract all the links in the document, and check the modification time of
   each link. 
 3. Extracting URL's from a HTML document and inline modification times
   if you call w3new with the -html argument and a HTML document
   URL, it will extract the document links and retrieve the modification
   times, but unlike #2 (-u argument), it will replace the modification
   times within the document rather than producing a sorted list. 

   When w3new parses the document, it will look for a tag of the form 

   <w3new url="URL">
   For example
   <w3new url="http://www.host.dom/file.html">

   When it finds a tag like the one above, it will retrieve the last
   modification time for the document specified by the url= line, and
   include the last modification of that URL (if it exists) 


For more information on the usage, run w3new with the -? argument 

The latest information on this program can be found at 
http://www.stuff.com/cgi-bin/bbcurn?user=bcutter&pkg=w3new 

This program was built with software written by others.
Their contributions are greatly appreciated.. 

libwww-perl v0.20 or later by Roy Fielding (fielding@ics.uci.edu) 
   http://www.ics.uci.edu/WebSoft/libwww-perl/ 
evaluate_parameters (evap) by Stephen O. Lidie (lusol@Lehigh.EDU) 
   "All the implementations of evaluate_parameters are available via
   anonymous FTP from ftp.Lehigh.EDU (128.180.63.4). Look in the 
   pub/evap/evap-2.x directory for the latest compressed tar file." 
perl v4.036 by Larry Wall (lwall@netlabs.com) 
   ftp'able from ftp.uu.net in /systems/gnu as perl-4.036.tar.gz 



Known Bugs
==========

 o none (that I'm aware of...) 

Future work
===========

 o none (probably). I started to write some code that would do a checksum
   on a document specified by a URL ... this would allow you to track the
   modification time based on when the content changed... however I
   never finished coding/testing the checksum piece, and abandoned it.
   you'll find the code commented out below.. 
 o eventually I hope to convert this into a WWW applet and let the
   browser do the work. You'll find more info on WWW Applets at 
   http://www.let.rug.nl/~bert/W3A/W3A.html 

Brooks Cutter 
bcutter@stuff.com