Re: lwp-rget and *.htm files
Gisle Aas (aas@bergen.sn.no)
04 Jul 1997 09:49:55 +0200
Frederic Corne <frederic.corne@erli.fr> writes:
> I am currently on NT with perl win32 from ActiveWare
> I have installed libwww and the fix for it for win32 from Christopher Russo
> <crusso@MIT.EDU>
>
> All is ok.
>
> Only a little thing, (excuse me if this question has already been posted) :
>
> On windows machines, there are lot of html files named *.htm because of the
> old 8.3 compatibility. But It seems that these files are not considered as
> html files for URL module.
The URL module does not care about media types. It just handle naming.
> For exemple use the scrit lpw-rget on a location where some links are to
> some xxx.htm file. The script don't get them.
You mean they are renamed as "xxx.html"?
The idea is that the name suffix you find on something outside your
host does not mean anything. It is the media type (returned as the
Content-Type if you get it from a HTTP server) that matters. When we
want to store something on our local disk we try to save it with a
suffix does not loose the media type info.
For instance, if I ask for 'http://somewhere/foo.html' and I get a
'image/png' document back, then I want to store it as 'foo.png' so
that I get the original media type when I retrieve it from my local
disk.
When there is more than one suffix that map to a single media type
then LWP assumes that the first one is the preferred. If you prefer
.htm as suffix then just swap the suffixes in LWP/media.types.
You could also make your own ~/.media.types (if that file name makes
sense on NT) which contains this:
text/html htm html
> I don't known if on unix it is the same behaviour.
It is.
--
Gisle Aas <aas@sn.no>