Re: Email Extractoer

Fabrice Scemama (fabrice.scemama@gesnet.net)
Mon, 06 Jul 1998 17:38:50 +0200


Afaik, no module was done about this.
But you might consider programming 2 scripts.

The first would retrieve all the URLs you want to search,
using LWP::Simple or LWP::UserAgent for example.

And the second would retrieve the first web page of
each URL, and then :
- look for any email (just grep the @ and verify it looks
  like an email, i.e. at least x@x.nic)
- if there's no email :
  - look for a possible <frameset somewhere, then launch
    child processes of the script for each frame untill an
    email is found
  - if there's still no email :
    - look if the URL is something like http://aaa.bbb.ccc/~yyy,
      then assume the email is yyy@bbb.ccc
    - if still no email and you really want one !
      - assume the email is webmaster@bbb.ccc (NOT recommended
        since you might not want to send a mail to webmaster@aol.com...)

I don't know what you plan to do with that stuff, but you know,
you get nothing but troubles with spam (the second script has been
asked to me by a client, and of course, he got the troubles I had
forecasted he would have and... his company failed anyway).

Regards
Fabrice Scemama (Paris)


At 05:31 PM 7/6/98 +0500, Surat Singh Bhati wrote:
>Hi Listmembers,
>	I wan to develop a Email extracter , 
>I want to issue a query to a search engine (say altavista) for 
>a searc string , and want to get the Email address listed in all
>the resulting URL and sites.
>
>I have two problem,
>(1) Search the web for a search string.
>(2) Go deep into each result and 
>(3) Extract Email from these result.
>
>	Is there is any pre-written routine available for above task, 
>
>TIA
>-Surat Singh
>
>=====================================================
>=       Surat Singh Bhati (surat@indiamart.com)	    =
>=       InterMESH Systems                           =
>=       (Internet Solution Providers)               =  
>=       Voice   : +(91) 11-205-1719, 220-0349       =
>=       Fax     : +(91) 11-241-2591                 =
>=       Web Site: http://www.IndiaMART.com          =
>=                 http://travel.indiamart.com       =   
>=                 http://apparel.indiamart.com      =   
>=====================================================
>