links
Angelique Augereau (augereau@skew2.kellogg.nwu.edu)
Wed, 30 Sep 1998 14:31:11 -0500 (CDT)
Hi:
I am trying to extract data form the US Patent database (patents.uspto.gov).
I am using the LWP::UserAgent module to retrieve results from a search,
I then parse the resulting html for links which I then follow to download
data on the patents. the problem i'm having is that some of the links look
like the following (if i do view page source)
http://patents.uspto.gov/cgi-bin/ifill4?INDEX+0+27061+1+50+166+OF+4
however when i try to extract this link the 27061 is always a different
number as is the 166 and the OF changes to a B. so clearly when i try
to link to the page i get an error message.
i've tried to retrieve the links by just searching the html code and
also by using extract_links and i have the same problem with both.
any thoughts?
Thanx, Angelique.