Someday soon, you might find yourself behind the steering wheel of your car and you’ll want to dial your phone, find out where the nearest Starbucks is, change the music you are listening to and adjust your heat or air conditioning. And you will be able to do all of this with nothing but the power of your own voice.
IBM and a number of its corporate partners gathered Tuesday at the company’s New York City offices to discuss the current state of speech recognition technology and its future. A forum and a series of product demonstrations focused largely on technology allowing drivers to control gadgets without taking their eyes off the road or their hands off the steering wheel. While some of the products are available already, there figure to be many new technologies and improvements on existing ones in the coming years.
“Speech can be revolutionary,” said Brian Garr, program director of IBM’s enterprise speech solutions. Garr noted that it has been 10 years since IBM introduced its first speech recognition product.
“We can actually change the paradigm of the way people think about completing transactions, the way people think about interacting with computers or not even caring that they’re interacting with computers,” he said.
A product planning manager at Pioneer Electronics discussed the company’s AVIC navigation system and upgrades planned for a new version to be released in late March.
In the new version, a driver can announce that he or she wants to find the nearest Starbucks and the navigation system will locate the closest one and speak the directions.
“If you want to find a Starbucks coffee or McDonald’s, you can simply say ‘vicinity search Starbucks coffee’ and it will take you to your nearest Starbucks, said Ted Cardenas of Pioneer. “You can simply say go to JFK and that will take you to the JFK airport. It really simplifies the operation of the unit.”
Improvements in technology are allowing Pioneer and other systems to understand synonyms. A driver can say “go,” “search” or “destination” and the system will realize the words all mean essentially the same thing, Cardenas said.
Drivers can use the Pioneer system to control music sources as well. You might say “FM radio” or iPod to select a music source, and then ask for a certain artist or song you want to listen to.
“Using … the voice recognition engine were able to keep the drivers eyes on the road and their hands on the wheel by avoiding the interaction both visually as well as by touch with the unit itself installed in the dash,” Cardenas said.
Pioneer and several other companies present today use IBM ViaVoice, a speech recognition software product.
All Media Guide, which provides music information for online retailers and audio tools such as Windows Media Player and iTunes, is planning to enter the speech recognition market with a product capable of recognizing varying names for an artist. You might call Bruce Springsteen “the Boss,” for example, or not remember which name Puff Daddy is going by these days, but you’d still be able to get the song you want.
“We want someone to be in front of their media player … or in their car and say ‘play me the Godfather of Soul, Mr. Please Please Please, Mr. Dynamite, the hardest working man in show business’ and have it come back as James Brown,” said Zac Johnson, product manager at All Media Guide.
Johnson said the company will probably have the product in automobiles by the end of this year. All Media Guide officials also hope to allow customers to search for music by reciting lyrics, in cases when they can’t remember a song title.
Johnson Controls, which sells a voice-activated phone system for automobiles, is planning a more ambitious project that would combine phone service, navigation, music playing and temperature control all into one voice recognition package. Drivers would switch between devices by pressing buttons on the steering wheel but otherwise would control them by voice. This “mobile device gateway” probably won’t be on the market until 2009 or 2010, said Mark Zeinstra, product director for infotainment at Johnson Controls.
While speech recognition technology could prevent drivers from taking their eyes off the road, Zeinstra wants to make sure his company isn’t requiring people to issue voice commands too often.
“Were trying not to go real overboard on this because sometimes it’s a little bit easier to just tap the button rather than do everything by voice,” he said. “We’re trying to do what’s appropriate in a vehicle.”
Having all this voice-controlled technology in a car raises one obvious question: can these products differentiate between conversation and commands? It’s easy to imagine one of these devices performing an action after misinterpreting the noises made by a carful of screaming kids.
Requiring drivers to touch a button before issuing a voice command is one way to solve the problem. Researchers are also building grammar and language modules around activities to prevent problems, said David Nahamoo, chief technology officer for speech technology at IBM.
IBM is working on another approach: pointing a camera at the driver to determine whether he or she is speaking. If someone in the back seat or passenger seat is talking, the voice recognition functions would not activate because the camera doesn’t see the driver speaking.
“If a lot of other things are going on (in the car) it’s an automatic way of filtering everything out,” Nahamoo said. “If the mouth of the driver is not moving that means … no action should take place.”
Learn more about this topicSpeechTEK show highlights the right way to use speech technology
8/08/06Nuance claims breakthrough on speech recognition
7/18/06Call mining gets a boost3/28/05