lwp-rget and *.htm files

Frederic Corne (frederic.corne@erli.fr)
03 Jul 1997 11:18:52 +0200


Hi all,

I am currently on NT with perl win32 from ActiveWare
I have installed libwww and the fix for it for win32  from Christopher Russo
<crusso@MIT.EDU> 

All is ok. 

Only a little thing, (excuse me if this question has already been posted) :

On windows machines, there are lot of html files named *.htm because of the
old 8.3 compatibility. But It seems that these files are not considered as
html files for URL module.

For exemple use the scrit lpw-rget on a location where some links are to
some xxx.htm file. The script don't get them. 

Looking at the lwp-rget code, it looks like that incredibly complicated
regexp that starts off "$doc =~ s/(<\s*(img|a|body)..." is probably the answer.

I have searched in the modules about a pattern search on .html$ that I can
transform in a .htm[l.]*$, but nothing.


I don't known if on unix it is the same behaviour. 

Any idea ?

FC