I am looking for ideas on why this program doesn't report problems

Larry W. Virden (lvirden@cas.org)
Thu, 25 Feb 1999 13:44:48 -0500 (EST)


I have appended a program from tchirst.  It's intention is to take a URL,
extract links, and verify that they exist.

However, I am not getting an error where I would expect one.
Does anyone on this list see something wrong in the use of LWP::Simple
URI::URL, or HTML::Parse that would cause this to fail?

$ cat t.html
<A HREF="http://www.notthere.stuff/">there</a>
$ perl -w churl http://myserver/t.html
http://homepage/home/lwv26/t.html:


#!/usr/local/bin/perl -w

# churl - tchrist@perl.com
# v0.1 (prototype)
#
# extract urls and verify validity.
# only looks for FTP, HTTP, and FILE schemata,
# stored in A or IMG tags.
#
# retrieve Perl source from
#   http://www.perl.com/CPAN/src/5.0/latest.tar.gz
# retrieve the LWP library from
#   http://www.perl.com/cgi-bin/cpan_mod?module=LWP

# USAGE: churl URL [URL ...]

require 5.002;

BEGIN { die "usage: $0 URL ...\n" unless @ARGV }

use strict;
use URI::URL;
use HTML::Parse qw(parse_html);
use LWP::Simple qw(get head);

$| = 1;

my($ht_tree, $linkpair, $fqurl, $base, %saw, @urls, %check_this);
foreach ( qw(file ftp http) ) { $check_this{$_}++ }

foreach $base ( @ARGV ) {
    print "$base:\n";
    $ht_tree  = parse_html(get($base)) || die "no doc";
    foreach $linkpair (@{$ht_tree->extract_links(qw<a img>)}) {
        my($link,$elem) = @$linkpair;
        my $url = url($link,$base); # XXX not real base
        unless ($saw{ $fqurl = eval { $url->abs } || $url->as_string }++) {
            print "  $url:  ";
            if ( $check_this{ lc($url->abs->scheme) } ) {
                my @headers = head($fqurl);
                print @headers ? "OK" : "BAD";
            } else {
                print "SKIPPED";
            }
            print "\n";
        }
    }
}
-- 
Larry W. Virden                 <URL: mailto:lvirden@cas.org>
<URL: http://www.purl.org/NET/lvirden/> <*> O- "No one is what he seems."
Unless explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.