Skip Links

Network World

  • Social Web 
  • Email 
  • Close

(Comma separation for multiple addresses)
Your Message:

VoiceXML lets you talk to computers

By James Larson , Network World , 03/22/2004
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
  • Share/Email
  • Tweet This
  • Comment
  • Print

There are many interactive voice response applications that let users listen to computers and respond by pressing the buttons on touchtone phones. However, callers often get lost traversing long, time-consuming sequences of menus. It's also difficult for callers to juggle between listening and searching for the right buttons to press on the small keypads of their cell phones. What's needed are IVR user interfaces that let users listen and speak to computers.

VoiceXML 2.0 is a markup language for building speech interfaces - the voice equivalent of HTML. A voice browser is like a Web browser - it interprets VoiceXML 2.0 scripts to present spoken information to users and accept spoken requests from them.

The World Wide Web Consortium last week made VoiceXML 2.0 a full recommendation, which is commonly understood to be a Web standard. The standard adds a speech-recognition grammar format - for words and phrases that users can speak in response to prompts - that was not included in previous version.

Call components

Because telephones, including many cell phones, don't have the computation capability to host a voice browser, the voice browser resides on the network in a speech server. The speech server may be located in a corporate data center or off-site at a hosting provider.

Users dial a speech server, which downloads VoiceXML 2.0 scripts, grammar formats and audio files from an application server.

The voice browser interprets the VoiceXML 2.0 script by presenting users with a voice message, such as:

System: "Welcome to Ajax. Do you want to speak with sales, accounting or repairs?"

The voice message could be prerecorded voice or text that is routed through a text-to-speech synthesizer.

The voice browser invokes an automatic speech recognizer (ASR), which uses the grammar format to recognize words users speak:

User: "Repairs."

The ASR recognizes the user's spoken response. In this case the grammar format consists of only three words: "sales," "accounting" and "repairs." This type of grammar-driven ASR performs more accurately than dictation ASRs, which attempt to recognize most of the words in English or whatever language a user is speaking.

Sometimes, users might respond by using dual-tone modulated frequency (DTMF). DTMF is useful in noisy environments or when the user wants to reply confidentially.

The voice browser continues processing the VoiceXML 2.0 script, perhaps performing additional conversational turns, invoking an application-specific function or accessing information in a database.

With VoiceXML 2.0, developers can create speech-enabled applications by specifying high-level menus and forms rather than procedural program code. This frees up more time for developers to test the application usability and refine its design.

Giving voice to new apps

Developers use VoiceXML 2.0 to provide a telephone-user interface for many types of applications and information, including time-sensitive data, business data and personal information. These applications let users access enterprise data wherever they are and whenever they need it by simply dialing in from any phone, identifying themselves and asking for the desired information. Customers also can use these systems to access data such as order status and catalog, delivery and account information.

  • Share/Email
  • Tweet This
  • Comment
  • Print

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed