Re: URL-ification
Tom Christiansen (tchrist@mox.perl.com)
Wed, 10 Jan 1996 00:02:06 -0700
While my initial thoughts would have been to simple use
\S+?:\S+
for a generic URL, what I had to deal with was generic
text that often included trailing punctuation, generally [,.?].
that's why it became, well, harder.
while
http://host/foo.bar.
or
http://host/foo.bar,
is legal, I wanted it to detect those trailing bits and omit them.
a question i had was whether this was a reasonable sequence:
$ltrs = '\w';
$gunk = '/#~:.?+=&%@!\-';
$punc = '.:?\-';
$any = "${ltrs}${gunk}${punc}";
that is, what is the legit character set for URLs?
--tom