Skip Links

New speech technologies making noise

By , Network World
March 22, 2004 12:05 AM ET

Network World - A key standard for building speech-based telephony applications, VoiceXML 2.0, received a final nod of approval from the World Wide Web Consortium last week.


Also: Technology Update - VoiceXML 2.0


The standard's official graduation comes just days before Microsoft is expected to formally launch its Speech Server products - which adhere to a competing standards effort - at the SpeechTEK conference this week in San Francisco.

The W3C advanced VoiceXML 2.0, along with the supporting Speech Recognition Grammar Specification (SRGS), to final "recommendation" status, effectively making them Web standards. These are the most mature of a handful of specifications in the W3C's evolving Speech Interface Framework.

The Speech Interface Framework aims to define a set of standards for building applications that let people interact with Web-based services over a telephone. The applications use a variety of voice-based interfaces that range from keypads and spoken commands to music and synthetic speech. Within the framework, VoiceXML controls how a voice application interacts with a user. Developers use SRGS to describe the words and phrases that end users are expected to give in response to spoken prompts.

Other elements of the framework include Speech Synthesis Markup Language (SSML), which is used for creating spoken prompts; Voice Browser Call Control (CCXML), which provides telephony call-control support for VoiceXML and other dialog systems; and Semantic Interpretation for Speech Recognition, which defines links between grammar rules and application semantics so that an application recognizes that two spoken variations of the same element, such as "Coke" and "Coca-Cola," should be treated as the same response.

VoiceXML, already is broadly adopted. It has become a standard scripting language for making Web content accessible via voice and phone - letting users make selections and provide information by talking instead of touching numbers on a keypad.

"VoiceXML allows users to create a description of a dialog between computer and user that can output text, graphics, synthesized speech, digitized audio - and also provide a means to recognize inputs from all these sources," says Ron Schmelzer, a senior analyst at ZapThink. "What makes VoiceXML cool is that you can specify an interface for application functionality that is not Web-based, but specify it in a way that allows Web developers to control how these voice-based application interfaces work."

Scores of vendors have deployed VoiceXML 2.0-compliant applications, products and services, including HP, IBM, Lucent, Motorola and Nuance. 

How they compare
Competition between two standards — VoiceXML and SALT — add drama to a growing movement to speech-enable Web content.
  VoiceXML SALT
Maturity: Established. Version 2.0 is an official W3C standard. Early stages. Version 1.0 SALT specification is under consideration within the W3C.
Adoption: Significant. Scores of vendors have released VoiceXML-compliant products. Growing. Speech technology vendors will add support for SALT as Microsoft’s new Speech Server platform gains users.
Likely audience: Java 2 Enterprise Edition developers. Visual Studio .Net developers.
Click to see:

Meanwhile, Microsoft is making waves with its Speech Server 2004 speech-recognition platform. Bill Gates, Microsoft's chairman and chief software architect, is scheduled to formally launch the Standard and Enterprise editions at the SpeechTEK conference.

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News