Chapter 5 Simply Speaking: The Integration of Speech Recognition with Voice and Unified Communications

Addison-Wesley Professional

Speech vision statement by XD Huang, General Manager, Microsoft Research Communications Innovation Center

Early on, Microsoft’s initial investment in speech recognition technology was to provide individuals with the ability to customize the interaction between a user and an operating system to perform tasks saving keystrokes and time. Over many years, much work was put into multilanguage support, ease of use in regards to end-user speech training to enhance the recognition process, and then the application of providing programming templates to enable developers to integrate speech into their applications.

Fortunately, there existed a man, professor, and mentor with a vision to take speech recognition even further by adding speech into not so obvious places such as telephone devices, telephony interactive voice response (IVR) systems, and other cool applications. This man, this professor, this mentor is Xuedong Huang. XD, as he is known, is now the General Manager of the prestigious Microsoft Research Communications Innovation Center in Redmond, Washington. His latest project, Microsoft Response Point, provides speech recognition in small business phone systems.

Within this chapter, we will explore the innovations within speech technology throughout Microsoft’s Voice and Unified Communications products, highlighting the vision of XD Huang.

The Vision of Speech within Microsoft’s VoIP Products and Services

Before I met XD in November 2005, I did not know what to expect. I was nervous about meeting him as I had heard of XD through my experience working with the Unified Communications Group, known at that time as the Real-time Communications group or RTC team. The opportunity to work at Microsoft Research was one of the coolest things that had happened to me in my previous 13 years of working with Microsoft. At my first meetings with the Communications Innovation Center in Redmond, I was to provide feedback and become familiar with a new communications product that would serve as a small business IP Phone System, then called by its code name ,” now known as Microsoft Response Point. The prototype I was privy to then was a black box that looked like a cable box for television and the ugliest phone I had ever seen. Surrounding this technology were experts from various areas of Microsoft including JJ Cadiz, Bob Taniguchi, Robert Brown, Li Jiang, Regi John, Jayman Dalhal, and XD Huang among others. When I met the team, I immediately knew that the next several years were going to be interesting. XD, one of the most remarkable technologists I have met, provided an overview of what the vision of the CIC and Edinburgh was, and it was up to me to build a readiness and training plan for partners to launch this product off the ground. To provide a better understanding of his vision of this product and others, XD gave me a vision document, titled “Voice Communications 3.0,” which had been published internally and reviewed by Bill Gates himself as Bill has been heavily involved with the team since the beginning—even though I have never met Bill personally.

Given XD’s extensive experience in the areas of speech and VoIP, I think it is important to capture XD’s vision of the future of speech services and how they will provide further innovation in the future of voice and unified communications technologies. The following are XD’s direct comments:

“My vision on voice communications is that voice is the most natural way for people to communicate, which will enable us to bring a wide range of devices to the web 3.0 era. VOIP helps this cause and provides a means for us to bring people and devices together.”

—XD Huang, General Manager, Communications Innovation Center, Microsoft Corporation

Microsoft Voice and Speech Integration

Building on XD’s vision, Microsoft tightly and innovatively integrated speech recognition technology within many of its enterprise and small business products. The evolution of this integration began with a separately sold product for enterprise customers called Microsoft Speech Server. With Microsoft Speech Server, customers had the ability to develop, enable, and integrate speech integration within their line of business applications for the purposes of providing IVR for phone systems, accessibility solutions for the disabled, or to speed up the process of applications by using voice commands. This server, managed by XD’s team, gave developers the opportunity to build on an open platform to integrate speech in ways previously not thought possible.

As of 2007, Microsoft discontinued Speech Server as a separate server product and decided to open the platform internally to allow Speech Server to be enabled in several Microsoft products. We now see the speech engine technology within our PCs and mobile devices, but from a communications perspective, Microsoft provided speech integration within two voice and unified communications products with Microsoft Office Communications Server 2007, now in its R2 release, and Microsoft Response Point, currently in version 1, Service Pack 2. The following is an overview of how speech configuration and development are enabled within these two Microsoft Voice and Unified Communications products.

Microsoft Office Communications Server Speech Integration

With the announcement of “Live Server,” the beta name of Office Communications Server 2007, came the death of Microsoft Speech Server as a separate Microsoft server product licensed and sold by Microsoft. Speech Server’s new role is to make available its engine through a series of SDKs and APIs to other Microsoft products including both Microsoft Exchange Server and Microsoft Office Communications Server in the Microsoft Unified Communications platform and for developers to create custom speech recognition formulas and processes to handle requests via voice. In regards to how Office Communications Server (OCS) uses this technology, the speech server toolset enables developers to create custom IVR prompts and processes to handle call flow and automated call distribution (ACD).

Figure 5.1 is an example of a sample IVR process that can be configured using Microsoft Office Visio to customize the way incoming calls are handled for a company.

Figure 5.1

Figure 5.1 

Automated Call Distribution example (Microsoft Visio)

After the process is completed and saved, the template is then used by OCS to process incoming calls. In this example, if an external caller needs to reach the IT help desk, a call distribution process is enabled to move the caller to this template in which the caller will reach another set of voice/speech-enabled menus to reach the appropriate party within the organization to handle this incoming request. Enabling this capability, helps organizations provide better service to their customers and partners as well as provides better efficiency in regards to ensuring that calls are routed to the appropriate contact with the right skill set or job role to handle the request.

By far, the most powerful speech integrated service within the Microsoft Unified Communications platform is the user’s ability to call into the Microsoft Exchange Server leveraging the Unified Messaging and Outlook Voice Access components. Through this feature, calendar, e-mail, contacts, and so on are made available to users via voice. Through speech recognition and voice prompts, the service can manipulate your calendar, e-mail, contacts, and so on as well. For example, my organization has deployed the Microsoft Exchange Server Unified Messaging solution, and we have the Outlook Voice Access service running as well. When I am mobile, I can call into my account and the Exchange Server can tell me what meetings and calls I have remaining for the day. If there is a reason to do so, and I have done this before, I can tell the service to clear my calendar for the day and the server will not only remove my meetings for that date, but also inform the participants of each meeting that I had to cancel. The same process works if I am running late to a meeting, and with the traffic in Houston, Texas, that happens a lot. In this situation, I can tell the service I’m running 15 minutes late and the service will adjust the meeting and/or simply inform the participants of the meeting that I’m running late. Hmm, what did we do here? We just eliminated the need for a personal assistant! Figure 5.2 depicts how this server-based architecture works.

Figure 5.2

Figure 5.2 

Exchange Voice Access architecture

Microsoft Office Communications Server speech recognition also handles all supported language packs to enable enterprise voice and communications in any language. Also, no voice training is needed by users to educate the system on their voice and how they communicate, which saves time and provides a quick on-ramping process for customers to begin taking advantage of services such as voice-dialing, custom call routing plans, and so on.

Microsoft Response Point and Speech Recognition

Taking speech to an entirely new level, the Microsoft Communications Innovation Center decided to take “speech to the street” in respect to enabling speech recognition within the phone devices themselves. Microsoft Response Point, as mentioned earlier in the book and in XD’s vision document, is an innovative IP Phone System for small businesses and now branch offices. Each certified OEM Response Point phone device is enabled with a Response Point blue button that, when pressed, gives the person using the phone the ability to tell the phone what they want it to do—whether placing a call, transferring a call, and so on.

Microsoft Response Point also has a speech recognition-enabled automated attendant that allows external callers to reach groups, locations, and individuals via voice as well. This great feature comes at no additional cost to the customer and provides the customer a virtual built-in operator. Using Response Point Administrator, you also can customize the way these calls are answered and handled. You also can customize the welcome greeting and improve efficiency by offering answers to frequently asked questions that external callers might have, such as what are the company’s hours, fax number, and location.

Configuring ACD and the Automated Attendant service within Microsoft Response Point is easy. Figure 5.3 is a screen shot of the Microsoft Response Point Administrator management console that allows you to customize these settings.

Figure 5.3

Figure 5.3 

Microsoft Response Point Administrator call distribution settings

Figure 5.4 is an example of how to use Microsoft Response Point Administrator to customize answers to frequently asked questions from external callers.

Figure 5.4

Figure 5.4 

Microsoft Response Point Automated Receptionist Properties

As mentioned earlier, for end users, each Microsoft Response Point phone is enabled with a blue Response Point button, also known as the “Magic Blue Button,” as shown in Figure 5.5. There is even a Web site called as well! This button, when pressed, dials the Response Point Base Unit device and then enables the user or caller to say the name or extension of the person he is trying to reach.

Figure 5.5

Figure 5.5 

Microsoft Response Point Magic Blue Button (on each OEM phone device)

The Response Point button can also be used during a call to park a call, retrieve a call, and transfer a call. For a list of voice commands, the user can say “What can I say,” and the system responds with available voice commands. The system does not need to be trained, can be used to voice-dial contacts in your Microsoft Office Outlook or Windows contact list by importing contacts, and ensures that you never drop a transferred call, never have to look for a number again, and overall simplifies your phone experience.

Providing such a user friendly interface to enable administrators to quickly customize voice-enabled commands for an IP Phone system is just one of the latest innovations to come out of Redmond and puts Microsoft ahead of the competition in providing an innovative feature as a voice-dialing button on a phone device. Simple, but innovative nonetheless!

The Future of Speech with Voice-Enabled Applications

We are already seeing how speech recognition is changing the way people communicate. This is all made possible by software and solid speech recognition engines that make voice training the process of the past and the configuration of how voice commands are handled seamlessly.

In the future, based on my work with Microsoft Research and providing competitive analysis and research through my firm, the next big wave of speech recognition in voice communications products will primarily be fourfold:

1 2 Page 1
Page 1 of 2
IT Salary Survey: The results are in