Taming the XML beast

XML is here to stay, having become the standard markup syntax for most new Web services protocols, formats and interfaces, such as Simple Object Access Protocol. The flood of XML on networks will continue to grow, whether or not IT professionals are prepared.

Traditionally, XML's biggest disadvantage has been its bloated, ASCII-text-based encoding, which requires that you send considerably more bits than in non-XML binary data transfers. Companies can't address XML's bandwidth consumption issues effectively without universal standards that describe how this content can be encoded in binary formats. Fortunately, the industry has made notable progress in this area.

One approach is to rely on various industry specifications that use an XML-based SOAP message as a manifest for describing binary data files within SOAP's surrounding HTTP packet. SOAP with Attachments (SwA) and Microsoft's Direct Internet Messaging Extensions (DIME) transmit opaque, non-textual data - such as images and digital signatures - along with an XML document. But they don't support binary encoding of all content within XML documents.

Neither SwA nor DIME has achieved broad adoption within the industry. Recognizing the critical need for a consensus standard for compact XML encodings, the World Wide Web Consortium (W3C) has developed new candidate recommendations for binary encoding of XML within SOAP 1.2 payloads: SOAP Message Transmission Optimization Mechanism (MTOM) and XML-binary Optimized Packaging (XOP). The W3C's XML Binary Characterization Working Group has released the first public working draft of its "XML Binary Characterization Properties" document, describing properties desirable for MTOM, XOP or any other serialization of the XML data model.

MTOM and XOP have much broader vendor support than any predecessor specification for XML-to-binary serialization. MTOM and XOP describe how to produce optimized binary encodings of XML content within SOAP 1.2 payloads. MTOM and XOP preserve one of XML's great strengths: the transparency of the tagged, logical data structure that a particular document implements.

For any given XML document, MTOM and XOP preserve its logical transparency structure by encoding that structure in a text-based "XML Information Set" manifest, while allowing any of the document's contents to be serialized to any binary encoding. In particular, these specifications support binary encoding of XML content as Multipurpose Internet Messaging Extensions Multipart/Related body parts and encapsulation of those parts - along with the associated XML Information Set manifest - within SOAP 1.2 envelopes. The specifications also describe how to encapsulate binary-encoded XML body parts directly within HTTP packets (in cases where SOAP doesn't enter the equation), thereby reducing the size of XML files for transmission and/or storage.

One limitation of MTOM and XOP is that they only can be used to define hop-specific encoding contracts between adjacent nodes within an XML/SOAP-message-handling transmission path. The specifications don't describe how to define global XML-encoding optimization policies that apply across any arbitrary number of XML/SOAP-handling intermediary nodes - an important requirement.

It's important to note that MTOM and XOP aren't yet ratified W3C standards and that few commercial implementations exist. Companies that want to base their XML-optimization strategy on these specifications might have to wait a few years before they are implemented broadly in commercial application platforms, middleware environments and development tools.

But the industry momentum behind MTOM and XOP is strong. By the end of this decade, IT professionals everywhere will rely on these efficient encoding schemes to tame the XML beast.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10