Re: Bug report: using HTTP::Request::Common to send a POST request to a CGI.pm-based script

Lincoln Stein (lstein@cshl.org)
Mon, 21 Jun 1999 07:36:32 -0400 (EDT)


The CGI/1.1 draft recommends using the semicolon instead of the
ampersand as a delimiter, and in fact an HTML validator located at the
W3C complains at the ampersand.  After multiple (about 50) requests to
support the semicolon character, I added that to CGI.pm.  I'm sorry if 
this broke HTTP::Request::Common.

Lincoln

Gisle Aas writes:
 > "Schwartz, Todd" <todd.schwartz@intel.com> writes:
 > 
 > > CGI module version: 2.46
 > > LWP version: 5.42
 > > URI::URL version: 5.01
 > > 
 > > Problem description: I am using HTTP::Request::Common to build a POST
 > > request with name/value pairs.  One of the value strings is a text block
 > > that contains, among other things, a semicolon.  When the request is sent to
 > > a CGI.pm-based script, the value string containing the semicolon is
 > > truncated at the position of the semicolon.  I have attached two short
 > > scripts that duplicate this problem (see below).
 > >  
 > > The cause: CGI treats the semicolon as a name/value pair delimiter (see
 > > parse_params in CGI), but when the request is built, a semicolon appearing
 > > in one of the value fields does not get escaped (see query_form in package
 > > URI::_query).  In my opinion, RFC 2396 requires semicolon characters
 > > appearing in message content - including URI-encoded name/value pairs -- to
 > > be escaped.
 > 
 > I guess you say this based on the fact that ";" is a "reserved"
 > character, but as I read RFC 2396 it does not nessesary mean that it
 > have to be escaped inside the http query components.  Quote:
 > 
 >    Characters in the "reserved" set are not reserved in all contexts.
 >    The set of characters actually reserved within any given URI
 >    component is defined by that component. In general, a character is
 >    reserved if the semantics of the URI changes if the character is
 >    replaced with its escaped US-ASCII encoding.
 > 
 > I have never seen anything looking like an official specification on
 > how an 'application/x-www-form-urlencoded' string is to be encoded.
 > Does anybody have a reference to something?
 > 
 > Some experience with my current Netscape seems to indicate that it
 > will encode /[^\w.*-]/.  I will not object to change URI.pm if people
 > rely on this fact.
 > 
 > Regards,
 > Gisle
 > 
 > 
 > >              I am not sure whether the semicolon is really a valid
 > > name/value pair delimiter - this is why I included Lincoln in this posting.
 > > 
 > > This problem does not occur with CGI version 2.42 - only the ampersand is
 > > used as a separator.  I have not tried this with LWP 5.43, but I don't see
 > > anything in the code that would change this behavior.
 > > 
 > > Thanks,
 > > Todd
 > > 
 > > #!C:/Perl/bin/perl.exe
 > > # This is the request script
 > > use HTTP::Request::Common;
 > > use LWP::UserAgent;
 > > $ua = new LWP::UserAgent;
 > > $cgi_uri = "http://localhost/cgi-bin/testcgi.pl";
 > > $response = $ua->request(POST $cgi_uri, [TEXT=>"This is one clause; this is
 > > another."]);
 > > print $response->as_string;
 > > 
 > > #!C:/Perl/bin/perl.exe
 > > # This is the CGI script (testcgi.pl)
 > > use CGI qw/:all/;
 > > $query = new CGI;
 > > $text = $query->param('TEXT');
 > > print header('text/text');
 > > print "TEXT=$text\n";
 > > 
 > > Expected output: TEXT=This is one clause; this is another.
 > > Actual output: TEXT=This is one clause

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================