I am looking for ideas on why this program doesn't report problems
Larry W. Virden (lvirden@cas.org)
Thu, 25 Feb 1999 13:44:48 -0500 (EST)
I have appended a program from tchirst. It's intention is to take a URL,
extract links, and verify that they exist.
However, I am not getting an error where I would expect one.
Does anyone on this list see something wrong in the use of LWP::Simple
URI::URL, or HTML::Parse that would cause this to fail?
$ cat t.html
<A HREF="http://www.notthere.stuff/">there</a>
$ perl -w churl http://myserver/t.html
http://homepage/home/lwv26/t.html:
#!/usr/local/bin/perl -w
# churl - tchrist@perl.com
# v0.1 (prototype)
#
# extract urls and verify validity.
# only looks for FTP, HTTP, and FILE schemata,
# stored in A or IMG tags.
#
# retrieve Perl source from
# http://www.perl.com/CPAN/src/5.0/latest.tar.gz
# retrieve the LWP library from
# http://www.perl.com/cgi-bin/cpan_mod?module=LWP
# USAGE: churl URL [URL ...]
require 5.002;
BEGIN { die "usage: $0 URL ...\n" unless @ARGV }
use strict;
use URI::URL;
use HTML::Parse qw(parse_html);
use LWP::Simple qw(get head);
$| = 1;
my($ht_tree, $linkpair, $fqurl, $base, %saw, @urls, %check_this);
foreach ( qw(file ftp http) ) { $check_this{$_}++ }
foreach $base ( @ARGV ) {
print "$base:\n";
$ht_tree = parse_html(get($base)) || die "no doc";
foreach $linkpair (@{$ht_tree->extract_links(qw<a img>)}) {
my($link,$elem) = @$linkpair;
my $url = url($link,$base); # XXX not real base
unless ($saw{ $fqurl = eval { $url->abs } || $url->as_string }++) {
print " $url: ";
if ( $check_this{ lc($url->abs->scheme) } ) {
my @headers = head($fqurl);
print @headers ? "OK" : "BAD";
} else {
print "SKIPPED";
}
print "\n";
}
}
}
--
Larry W. Virden <URL: mailto:lvirden@cas.org>
<URL: http://www.purl.org/NET/lvirden/> <*> O- "No one is what he seems."
Unless explicitly stated to the contrary, nothing in this posting should
be construed as representing my employer's opinions.