Re: $ua->sinple_request($request)

Gisle Aas (gisle@aas.no)
25 Mar 1999 22:44:14 +0100


Ibon Aizpurua <epiola@jet.es> writes:

> In the swishspider (SWISH-E search engine) there is:
> $response=$ua->sinple_request($request)
>  
> With this you are carrying an entire page. That's
> right because your purpose is index all of pages.
> But I'm trying to carry only some lines, because first
> I want know if that page is written in certain language (in my case
> euskara -the language of the Basque Country) and then
> if it is so, carry the entire page. I want do that because
> don't want spend the time carrying page that I won't index.
> I try the next:
> $response=$ua->sinple_request($request,'file_to_examine',size);
> But doing this in the 'file_to_examine' I have the entire page-code
> and not only the size I put.
> -Can you help me???

You need to use the callback version of $ua->simple_request().  The
callback will then have to collect text until you know the language,
and then you can abort the request by die-ing.

Something like this (untested):

  my $ok_it_is_euskara;
  $ua->simple_request($request,
                      sub {
                         my($chunk, $res) = @_;
                         if ($ok_it_is_euskara) {
                              print FILE $chunk;
                              return;
                         }

                         $res->add_content($chunk);
                         die "Not text" unless $res->content_type =~ m:^text/:

                         my $c = $res->content_ref;
                         if (looks_like_euskara($c)) {
                             $ok_it_is_euskara++;
                             open(FILE, ">filename") || die "Can't open file: $!";
                             print FILE $$c;
                         } elsif (length($$c) > 512) {
                             die "Wrong language";
                         }
                      }
  if ($ok_it_is_euskara) {
      close(FILE);
      # do something about it
  }

  sub looks_like_euskara { my $textref = shift; .... }


Regards,
Gisle