Siri (and its various competitive products and platforms) is, frankly, amazing. Those who used to watch Knightrider back in the 80's finally realized what David Hasselhoff (or his mustache) achieved when talking to his car, Kit. Suddenly we're able to actually tell our digital assistant to do things. And it does them.
But once you get past the initial excitement of these tools, a bit of a sad reality rears its head - they don't actually work that well. Sure, they're amazing when we talk just right and do so in a quiet room. But in the thick of it, during the busy flow of our day-to-day lives, I'm yet to experience a voice recognition solution that is anywhere near flawless.
Which is why I was interested in news from VocalZoom, a so-called human-to-machine communication vendor. The Israeli-based startup is grappling with the problem of delivering natural, personalized and effective voice-controlled user experiences. What sets them apart, however, is the way they're going about this. VocalZoom is delivering sensors that give accurate and reliable voice control regardless of the environment they're used within.
VocalZoom has just signed an agreement to integrate its technology alongside iFLYTEK. iFLYTEK might be a name that you've never heard of (it certainly was for me), but it is the Chinese version of Siri and controls over 80% of the speech technology market in China. The agreement will integrate VocalZoom’s optical HMC sensor with iFLYTEK’s Voice Cloud intelligent speech technology platform, and initial tests have shown a better than 50% performance improvement on average. Importantly, the greatest performance enhancement is achieved in user environments with the highest ambient noise levels, such as driving an automobile with the windows rolled down.
The issue that VocalZoom is trying to resolve is the qualitative gaps caused by the fact that voice recognition relies solely on acoustic microphones. No matter how great the software attached to those microphones is, today's solutions can't achieve adequate voice isolation for achieving high levels of control, especially in noisy environments. The VocalZoom sensor overcomes this problem by gathering additional data generated during speech as facial skin vibrates around the mouth, lips, cheeks, and throat. By integrating the VocalZoom optical HMC sensor into a voice-control solution and focusing it on these areas, facial vibrations can be acquired, measured, and converted to an isolated, near-perfect reference signal with which the system can operate – regardless of noise levels.
In other words, VocalZoom can build a "picture" of what audio sounds are related to voice and what are not. From there, it is a relatively simple task to strip out the extraneous noise and leave the pure voice part of the audio input to be analyzed by the voice recognition software. Better together.
According to the companies, VocalZoom and iFLYTEK have tested the performance of a solution combining both companies’ technologies inside a moving automobile across a number of scenarios with varying levels and combinations of music, ambient car noise, and other interference. Performance is improved to an almost perfect score in environments, where high noise levels have previously made adequate voice-control performance virtually impossible.
Imagine reruns of Knightrider where The Hoff could talk to Kit not only with the windows up, but at 60 miles an hour on the freeway and with the wind blowing through his hair. Now that's science fiction, and might soon be science fact!
This article is published as part of the IDG Contributor Network. Want to Join?