w3new v0.4: Creates a What's New list of http: URL's
Brooks Cutter (bcutter@palan.palantir.com)
Mon, 25 Jul 1994 13:26:58 -0400 (EDT)
Included below is information about my w3new package ... you'll find
the latest information at the URL
http://www.stuff.com/cgi-bin/bbcurn?user=bcutter&pkg=w3new
w3new is based on the libwww-perl package, and uses the libraries
www,wwwhttp,wwwurl,wwwdates and wwwbot.pl
--------------------------------------------------------------------
w3new v0.4
**********
Creates a What's New list of http: URL's
========================================
w3new is a program that will extract a list of URL's from either your mosaic
hotlist, or will extract the URL's from a HTML document. It will then retrieve
the HTTP/1.0 modification dates for each document listed, and output a HTML
file with the URL's sorted by their last modification time.
Package: w3new
Author: Brooks Cutter (bcutter@stuff.com)
Latest version: 0.4
Last updated: July 22, 1994
Archive: w3new.tar.gz
(Includes everything you need to run it, except perl v4.036)
What it does:
1. w3new first extracts the URL's from your hotlist or a HTML document
2. w3new then proceeds to do a HTTP/1.0 HEAD on each http: URL and
stores the Last modified time of each URL (if available).
3. Finally, w3new sorts it's output based on the last modified time of the
document and categorizes them by the month they were modified. For
non-http URL's or URL's for which it can't retrieve the Last modified
time, it will list these at the bottom of it's HTML output
This program was written because I found myself frequently checking web
pages in my hotlist to see if they had recently been changed. Now I run w3new
from cron nightly, and check it's output each morning.
pointing w3new at a list of URL's
=================================
1. Using your Mosaic hotlist by default, w3new will read your Mosaic
hotlist file and extract the document URL's and document titles. It will
look for your hotlist in your home directory as the file
~/.mosaic-hotlist-default
this can be overriden by using the environment variable W3NEW_HOTLIST
The default and environment variable can be overriden with the
command line arguments -hotlist_fn or -i with a argument of the file in
Mosaic-hotlist format.
2. Extracting URL's from a HTML document if you call w3new with the
-u argument and a HTML document URL, it will retrieve that document,
extract all the links in the document, and check the modification time of
each link.
3. Extracting URL's from a HTML document and inline modification times
if you call w3new with the -html argument and a HTML document
URL, it will extract the document links and retrieve the modification
times, but unlike #2 (-u argument), it will replace the modification
times within the document rather than producing a sorted list.
When w3new parses the document, it will look for a tag of the form
<w3new url="URL">
For example
<w3new url="http://www.host.dom/file.html">
When it finds a tag like the one above, it will retrieve the last
modification time for the document specified by the url= line, and
include the last modification of that URL (if it exists)
For more information on the usage, run w3new with the -? argument
The latest information on this program can be found at
http://www.stuff.com/cgi-bin/bbcurn?user=bcutter&pkg=w3new
This program was built with software written by others.
Their contributions are greatly appreciated..
libwww-perl v0.20 or later by Roy Fielding (fielding@ics.uci.edu)
http://www.ics.uci.edu/WebSoft/libwww-perl/
evaluate_parameters (evap) by Stephen O. Lidie (lusol@Lehigh.EDU)
"All the implementations of evaluate_parameters are available via
anonymous FTP from ftp.Lehigh.EDU (128.180.63.4). Look in the
pub/evap/evap-2.x directory for the latest compressed tar file."
perl v4.036 by Larry Wall (lwall@netlabs.com)
ftp'able from ftp.uu.net in /systems/gnu as perl-4.036.tar.gz
Known Bugs
==========
o none (that I'm aware of...)
Future work
===========
o none (probably). I started to write some code that would do a checksum
on a document specified by a URL ... this would allow you to track the
modification time based on when the content changed... however I
never finished coding/testing the checksum piece, and abandoned it.
you'll find the code commented out below..
o eventually I hope to convert this into a WWW applet and let the
browser do the work. You'll find more info on WWW Applets at
http://www.let.rug.nl/~bert/W3A/W3A.html
Brooks Cutter
bcutter@stuff.com