libwww-perl and Personal Web Pages URL validation

UTK Homepages (homepage@solar.rtd.utk.edu)
Wed, 29 Mar 1995 23:38:16 -0500 (EST)


Hello all,

Recently, I took on the task of creating an auto-updating list of personal
web pages at my university. With much help from our WWW committee, I was
able to come up with a Perl script to do the following:

1. Authenticate forms request using PH password.
2. Ensure a working server and valid URL.
3. Routinely reverify all URLs listed.

(Any utk.edu people who might be lurking can check the page out at
<URL:http://solar.rtd.utk.edu/utk-homepages/>. The script will allow anyone
not in our PH database to look but not touch.)

At first I was using a severly modified version of url_get 1.7, but wandered
across libwww-perl while investigating the Hypermail mbox-to-HTML package.
With libwww-perl, I was able to throw out some extremely questionable code and
deal with the expected problems in the main script.

In working with it, I ran across a couple of things I was hoping had been
thought of by the contributors.

1. Regarding www'stat:

Normally, we can safely assume an URL is valid if a 2xx code is returned.
What happens if the server returns a directory listing? This was not
acceptable for our purposes, but one of the major servers on campus issues a
non-standard 2xx phrase: "Sending directory listing." With the current
www'stat routine, there's no way to check for this. 

Fix: Parsed $headers to extract the phrase after the HTTP status code and
returned it as the last variable from www'stat.

2. Regarding wwwhttp.pl and wwwerror.pl via www'stat:

If we know a hostname isn't being found, should not the error message be
more specific about it than simply saying "Connection Failed?" The 602
code just isn't helpful enough in determining whether the connection was
refused or the address was entered wrong.

Fix: Added a new code, 604 Hostname Lookup Failed, for failed lookups. And
changed 602's phrase to Connection Refused.

Thanks for the attention... Comments welcome (of course).

Jonathan M. Bell
Personal Web Pages coder for the WWW Task Force
U. of Tennessee-Knoxville