Re: Bug report: using HTTP::Request::Common to send a POST request to a CGI.pm-based script
Lincoln Stein (lstein@cshl.org)
Mon, 21 Jun 1999 07:36:32 -0400 (EDT)
The CGI/1.1 draft recommends using the semicolon instead of the
ampersand as a delimiter, and in fact an HTML validator located at the
W3C complains at the ampersand. After multiple (about 50) requests to
support the semicolon character, I added that to CGI.pm. I'm sorry if
this broke HTTP::Request::Common.
Lincoln
Gisle Aas writes:
> "Schwartz, Todd" <todd.schwartz@intel.com> writes:
>
> > CGI module version: 2.46
> > LWP version: 5.42
> > URI::URL version: 5.01
> >
> > Problem description: I am using HTTP::Request::Common to build a POST
> > request with name/value pairs. One of the value strings is a text block
> > that contains, among other things, a semicolon. When the request is sent to
> > a CGI.pm-based script, the value string containing the semicolon is
> > truncated at the position of the semicolon. I have attached two short
> > scripts that duplicate this problem (see below).
> >
> > The cause: CGI treats the semicolon as a name/value pair delimiter (see
> > parse_params in CGI), but when the request is built, a semicolon appearing
> > in one of the value fields does not get escaped (see query_form in package
> > URI::_query). In my opinion, RFC 2396 requires semicolon characters
> > appearing in message content - including URI-encoded name/value pairs -- to
> > be escaped.
>
> I guess you say this based on the fact that ";" is a "reserved"
> character, but as I read RFC 2396 it does not nessesary mean that it
> have to be escaped inside the http query components. Quote:
>
> Characters in the "reserved" set are not reserved in all contexts.
> The set of characters actually reserved within any given URI
> component is defined by that component. In general, a character is
> reserved if the semantics of the URI changes if the character is
> replaced with its escaped US-ASCII encoding.
>
> I have never seen anything looking like an official specification on
> how an 'application/x-www-form-urlencoded' string is to be encoded.
> Does anybody have a reference to something?
>
> Some experience with my current Netscape seems to indicate that it
> will encode /[^\w.*-]/. I will not object to change URI.pm if people
> rely on this fact.
>
> Regards,
> Gisle
>
>
> > I am not sure whether the semicolon is really a valid
> > name/value pair delimiter - this is why I included Lincoln in this posting.
> >
> > This problem does not occur with CGI version 2.42 - only the ampersand is
> > used as a separator. I have not tried this with LWP 5.43, but I don't see
> > anything in the code that would change this behavior.
> >
> > Thanks,
> > Todd
> >
> > #!C:/Perl/bin/perl.exe
> > # This is the request script
> > use HTTP::Request::Common;
> > use LWP::UserAgent;
> > $ua = new LWP::UserAgent;
> > $cgi_uri = "http://localhost/cgi-bin/testcgi.pl";
> > $response = $ua->request(POST $cgi_uri, [TEXT=>"This is one clause; this is
> > another."]);
> > print $response->as_string;
> >
> > #!C:/Perl/bin/perl.exe
> > # This is the CGI script (testcgi.pl)
> > use CGI qw/:all/;
> > $query = new CGI;
> > $text = $query->param('TEXT');
> > print header('text/text');
> > print "TEXT=$text\n";
> >
> > Expected output: TEXT=This is one clause; this is another.
> > Actual output: TEXT=This is one clause
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================