Americas

  • United States

HTTP POSTs, XML and problems

Opinion
Aug 25, 20044 mins
Enterprise ApplicationsProgramming Languages

* A really geeky topic

Today I have a really geeky topic for you: a problem that you may run into with applications that send XML data as POST requests but do so incorrectly.

This is an important topic, and if you are trying to do a client/server interfacing job what I’m going to tell you may save you a few hours of head scratching.

Now, in case you have forgotten, the payload of a POST request follows the header. It looks something like this:

POST /test.asp HTTP/1.1

 Host: www.somehost.com

 User-Agent: Mozilla/4.0

 Content-Length: 27

 Content-Type: application/x-www-form-urlencoded

 userid=bob&password=fish

The payload is everything that comes after the blank line that follows the header. The format of the POST payload should be “variable=value” pairs separated by ampersands. This format is expected by the Web server because the original design for the POST interface was based on a data stream generated by a form in a browser. The form’s data consisted of fields and their values.

So, when you create a script (ASP, JSP, VBScript, etc.) to handle a POST request the data is parsed by the Web server on receipt from the client and then passed to your script.

Consider what this looks like using Active Server Pages (ASP) code: When you retrieve form data, you access the browser’s POST request through the Request object’s Form collection – in VBScript it looks like this:

 mydata = Request.Form(“userid”)

From the POST request I detailed above, this code would set mydata equal to “bob” (quotes not included) – the same result would be gained using:

mydata = Request.Form(1).

If your application sends an XML data stream to a Web server, it may well look something like the following:

 

 

 

   A A A

   B B B

   C C C

 

 

   D

   E

   F

 

This is actually the format of a three column, two-row table-type format generated by an application I was working with. When that data is received and processed by a Web server through the Form collection using the code above, the contents of mydata will be:

“SomeStuff”>AAABBB

CCCD

EF

What has happened is that the built-in parsing of POST requests by the Web server assumed that the characters “

But wait! There’s more!

You might have noticed something missing! You’re right if you spotted the missing white space … not the white space that was between the XML tags (that is not relevant) but the missing spaces in the data in tags. This is because it is the responsibility of the sender to “URL encode” all characters per RFC 1738, the “Uniform Resource Locators (URL)” specification, before sending them to the server. In the case of white space, the Web server simply ignores them. If other un-encoded characters were present, such as ampersands and equal signs, they would cause the server to make further assumptions and complicate matters even more!

A significant part of the problem I faced was due to the application I was working with which, it turns out, performs URL encoding incompletely. Consider the following characters:

$&+,/:;=?@ “‘#%{}|^~`

If this was URL encoded to RFC 1738, the character sequence sent to the server should be:

%24&%2B%2C/:;=?@%20%22’%3C%3E#%25%7B%7D%7C%5C%5E%7E%60

But the application generated the following:

$&+,/:;=?@ “‘<>#%{}|^~`

The result of the application POSTing XML data without conforming to the data format the Web server expects and not performing complete URL encoding cost me about six hours before I figured out what was going on. I just hope you read this before you suffer the same fate.

After all that, I need a drink. Cheers.

mark_gibbs

Mark Gibbs is an author, journalist, and man of mystery. His writing for Network World is widely considered to be vastly underpaid. For more than 30 years, Gibbs has consulted, lectured, and authored numerous articles and books about networking, information technology, and the social and political issues surrounding them. His complete bio can be found at http://gibbs.com/mgbio

More from this author