Re: extracting XML?

Gisle Aas (gisle@aas.no)
13 Feb 1998 00:30:48 +0100


Otis Gospodnetic <otis@populus.net> writes:

> can data in an HTML/XML document be extracted with LWP?

I don't know XML, but the HTML::Parser should work with most simple
SGML markup text.

> Things like Author of the document (name/email), Summary, Title, etc.
> 
> How about META tags data - are there built-in methods that can get data out
> of keywords, description, and other META tags or does one have to write its
> own parser for those tags?

No.

Regards,
Gisle




> Wouldn't it be better if the parser does it for you?

It would be wrong.  Which browsers do something like that?  None of
those around me do it?

Regards,
Gisle


p $
+# $Id: http.pm,v 1.39 1998/02/12 22:24:11 aas Exp $
 
 package LWP::Protocol::http;
 
@@ -140,7 +140,7 @@
 		die "short write" unless $n == length($buf);
 		LWP::Debug::conns($buf);
 	    }
-	} else {
+	} elsif (length($$contRef)) {
 	    die "write timeout" if $timeout && !$sel->can_write($timeout);
 	    my $n = $socket->syswrite($$contRef, length($$contRef));
 	    die $! unless defined($n);


w-perl/lib/HTTP/Date.pm,v
retrieving revision 1.28
retrieving revision 1.29
diff -u -u -r1.28 -r1.29
--- Date.pm	1997/12/02 10:58:31	1.28
+++ Date.pm	1998/02/12 23:13:47	1.29
@@ -290,7 +290,7 @@
 
    # Should we compensate for the timezone?
    $tz = $default_zone unless defined $tz;
-   return Time::Local::timelocal($sec, $min, $hr, $day, $mon, $yr)
+   return eval {Time::Local::timelocal($sec, $min, $hr, $day, $mon, $yr)}
      unless defined $tz;
 
    # We can calculate offset for numerical time zones
@@ -299,7 +299,7 @@
        $offset += 60 * $3 if $3;
        $offset *= -1 if $1 && $1 ne '-';
    }
-   Time::Local::timegm($sec, $min, $hr, $day, $mon, $yr) + $offset;
+   eval{Time::Local::timegm($sec, $min, $hr, $day, $mon, $yr) + $offset};
 }