Re: URL-ification

Tom Christiansen (tchrist@mox.perl.com)
Wed, 10 Jan 1996 00:02:06 -0700


While my initial thoughts would have been to simple use

    \S+?:\S+

for a generic URL, what I had to deal with was generic
text that often included trailing punctuation, generally [,.?].
that's why it became, well, harder.

while 
    http://host/foo.bar.
or
    http://host/foo.bar,
is legal, I wanted it to detect those trailing bits and omit them.

a question i had was whether this was a reasonable sequence:

    $ltrs = '\w';
    $gunk = '/#~:.?+=&%@!\-';
    $punc = '.:?\-';
    $any  = "${ltrs}${gunk}${punc}";

that is, what is the legit character set for URLs?

--tom