• United States
by Sanjeev Sawai, special to Network World

Voice XML 2.1 boosts functionality

May 08, 20064 mins
Programming Languages

Language used for developing interactive voice response.

VoiceXML is quickly becoming the standard language used for developing interactive voice response and speech-enabled self-service applications. Applications that were previously deployed only on the Web are now easily made available via the phone, giving customers a consistent, convenient method for interacting with retailers, banks and utility providers via the Web or telephone.

The latest version, VoiceXML 2.1, takes a significant step toward improving the responsiveness and adaptability of speech-enabled approaches. This can be the difference between customers who are happy with a company’s speech-enabled self-service options and those who take their business elsewhere.

The improvements proposed in VoiceXML 2.1, which is under consideration by the World Wide Web Consortium (W3C), demonstrate that VoiceXML is establishing itself as a stable, mature standard that deals with long-term issues, such as the mechanics of application development.

Version 2.1 adds two elements and enhances a half-dozen others. The new elements – and – are tools for creating sophisticated application functionality without unnecessary complexity.

The element enhances VoiceXML applications’ data import capabilities. It allows for XML data to be fetched and stored in an XML document object model (DOM) within the VoiceXML document in execution. The XML DOM may then be parsed within the VoiceXML application to access the data. The result is more efficient data access and less-complex application design.

The element provides a basic iterative construct available in all programming languages. It allows for iterating through an array of variables and performing any other elemental operation within the iteration loop. It can be used for parsing data fetched by the element or prompt and string concatenation operations. Along with the element, the element is the primary construct for procedural programming, which yields more logically constructed applications that are easy to maintain and upgrade.

VoiceXML 2.1 also adds attributes and capabilities for existing elements. The srcexpr attribute of the and

In VoiceXML 2.1, the element has been enhanced into a “marktime” attribute that lets callers speak and be recognized so they don’t have to wait until the entire prompt has been played. It can also be used to enforce the playing of certain prompts for timed audio, such as advertisements.

The element is used to terminate the association between a VoiceXML browser and a call. This element has been enhanced with an attribute that can provide platform-specific, call-related information to a browser. This improves integration with CCXML and the call control capabilities of the platform. Also in the call control area, the element has been enhanced with a type attribute to allow for bridging of two calls, or blind or supervised (consultation) transfer. The supervised transfer option results in finer control of what happens to a call in the no answer/busy scenario.

What’s next

The W3C still has to accept VoiceXML 2.1 as a recommendation, which is when the standard is considered final, and as of press time the W3C has given no indication of acceptance date. The next major VoiceXML upgrade, VoiceXML 3.0, is scheduled for a candidate recommendation in December, with the recommendation expected in June 2007. That specification will most likely include such features as multimodal markup, support for speaker verification, call control capabilities within VoiceXML and VoiceXML’s role as the primary standard for deploying full-featured voice-enabled applications over the Web.

Sawai is vice president of research and development at Envox Worldwide. He can be reached at