Re: Email Extractoer
Fabrice Scemama (fabrice.scemama@gesnet.net)
Mon, 06 Jul 1998 17:38:50 +0200
Afaik, no module was done about this.
But you might consider programming 2 scripts.
The first would retrieve all the URLs you want to search,
using LWP::Simple or LWP::UserAgent for example.
And the second would retrieve the first web page of
each URL, and then :
- look for any email (just grep the @ and verify it looks
like an email, i.e. at least x@x.nic)
- if there's no email :
- look for a possible <frameset somewhere, then launch
child processes of the script for each frame untill an
email is found
- if there's still no email :
- look if the URL is something like http://aaa.bbb.ccc/~yyy,
then assume the email is yyy@bbb.ccc
- if still no email and you really want one !
- assume the email is webmaster@bbb.ccc (NOT recommended
since you might not want to send a mail to webmaster@aol.com...)
I don't know what you plan to do with that stuff, but you know,
you get nothing but troubles with spam (the second script has been
asked to me by a client, and of course, he got the troubles I had
forecasted he would have and... his company failed anyway).
Regards
Fabrice Scemama (Paris)
At 05:31 PM 7/6/98 +0500, Surat Singh Bhati wrote:
>Hi Listmembers,
> I wan to develop a Email extracter ,
>I want to issue a query to a search engine (say altavista) for
>a searc string , and want to get the Email address listed in all
>the resulting URL and sites.
>
>I have two problem,
>(1) Search the web for a search string.
>(2) Go deep into each result and
>(3) Extract Email from these result.
>
> Is there is any pre-written routine available for above task,
>
>TIA
>-Surat Singh
>
>=====================================================
>= Surat Singh Bhati (surat@indiamart.com) =
>= InterMESH Systems =
>= (Internet Solution Providers) =
>= Voice : +(91) 11-205-1719, 220-0349 =
>= Fax : +(91) 11-241-2591 =
>= Web Site: http://www.IndiaMART.com =
>= http://travel.indiamart.com =
>= http://apparel.indiamart.com =
>=====================================================
>