Microsoft researchers say they have created a speech recognition system that understands human conversation as well as the average person does.
In a paper published this week the Microsoft Artificial Intelligence and Research group said its speech recognition system had attained “human parity” and made fewer errors than a human professional transcriptionist.
+More on Network World: Feds want to set a trail for future AI advances+
“The error rate of professional transcriptionists is 5.9% for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3% for the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state-of-the-art, and edges past the human benchmark. This marks the first time that human parity has been reported for conversational speech,” the researchers wrote in their paper. Switchboard is a standard set of conversational speech and text used in speech recognition tests.
The 5.9% error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task, Microsoft wrote on its web site.
The milestone comes after decades of research in speech recognition, beginning in the early 1970s with DARPA, Microsoft wrote. Over time, most major technology companies and many research organizations have developed speech recognition technologies including BBN, Google, Microsoft, Hewlett Packard and IBM.
+More on Network World: How do I know you’re lying? My “Star Wars” algorithm told me+
According to Microsoft: “The milestone will have broad implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.”
According to Microsoft Principal Researcher Geoffrey Zweig researchers are working on ways to make sure that speech recognition works well in more real-life settings. “That includes places where there is a lot of background noise, such as at a party or while driving on the highway. They’ll also focus on better ways to help the technology assign names to individual speakers when multiple people are talking, and on making sure that it works well with a wide variety of voices, regardless of age, accent or ability.”
“In the longer term, researchers will focus on ways to teach computers not just to transcribe the acoustic signals that come out of people’s mouths, but instead to understand the words they are saying. That would give the technology the ability to answer questions or take action based on what they are told.” Microsoft stated.
Check out these other hot stories: