Voice is becoming a pervasive way to manage and interact with everyday tech devices, going from initial adoption in phones and smart speakers toward smartwatches, cars, laptops, home appliances and much more.\nCloud platforms take most of the praise for enabling voice assistant services such as Amazon Alexa, Google Assistant or Microsoft Cortana \u2013 neglecting due credit to the increasing role that edge computing plays in enabling voice interfaces. A substantial amount of processing and analysis occur on devices themselves to allow users to interface with them by simply talking.\nKeyword detection\nVoice-enabled devices are not constantly recording audio and sending it to the cloud to determine if someone is giving them an instruction. That would not only be a privacy concern, but also a waste of energy, computing and network resources. Having to send all words to the cloud and back also introduces latency and slows the responsiveness of the system. Today\u2019s voice interfaces typically use keyword or \u201cwake-word\u201d detection, dedicating a small portion of edge computing resources (i.e. computing done on the device itself or \u201cat the edge\u201d) to process microphone signals while the rest of the system remains idle. This is a power-efficient approach particularly important to help extent usage time in portable battery-operated devices including smartphones and wearables.\nWhen the always-on processing core handling keyword detection, usually a digital signal processor (DSP), finds a match with the expected word (e.g. \u201cAlexa\u201d), it wakes up the rest of the system to support functions requiring more computing power such as audio capture, compression and transmission, language processing and voice tracking.\nSeparating signal from noise\nAfter a keyword is detected the device starts listening actively. At this time, the ability of the system to accurately interpret voice commands largely depends on how \u201cclean\u201d the voice reception is \u2013 which can be a challenge in a noisy environment such as a street, a party or a family room where kids are watching a movie or multiple people are talking at once.\nA number of edge computing technologies help separate the user\u2019s voice from other surrounding sounds. For instance, beam forming techniques process audio from multiple microphones in the device to focus listening in the direction where the user is speaking from \u2013 like a virtual directional microphone. If the user moves around, voice tracking algorithms running on the device can adjust the balance among signals from the microphones, so the focus follows the voice source.\nAdvanced voice-enabled devices also process inputs from the microphone array to suppress environment noise from user\u2019s talk, similarly to the way this operates in noise-cancelling headphones. Smart speakers also use on-device echo cancellation technology to allow for \u201cbarge in\u201d capabilities \u2013 which suppresses music and other speaker sounds from the microphone signals to help the smart speaker receive voice commands even when playing music loudly. So you can have great sound in a smart speaker but it\u2019s listening should you want to change songs or order a pizza in the middle.\nOn-device artificial intelligence\nIncreasing edge computing capabilities in voice-enabled gadgets also support innovative features using on-device artificial intelligence (AI). Offline commands, for example, allow on-device language processing and execution of basic voice instructions when internet connection not available. This feature is already widely available in smartphones, helping users to set alarms and reminders even if the device is in airplane mode or out of coverage. Offline commands are particularly valuable in smart home settings \u2013 letting users turn lights on and off, change thermostat temperature or disable the home security alarm even during an internet service outage.\nDevices with advanced edge computing power can also perform voice biometrics for user authentication. This capability prevents unauthorized users from making purchases or changing key settings using voice commands \u2013 so children don\u2019t keep adding items to the shopping list or burglars can\u2019t disable the house alarm by shouting at the smart speaker when the owners are on vacation.\nOn-device AI can also support audio classification for other uses beyond voice commands. A home security device can be trained to detect the sound of glass breaking and trigger an alarm, or a smart baby monitors can detect when a baby is crying and notify parents. Coupled with cameras, sound allows machine learning to put more context around people or events. As AI capabilities increase we\u2019ll see other very interesting applications. In the case of home security, for example, the ability to analyze events on the device itself reduces the amount of data sent to the cloud to just critical alerts, increasing the speed and convenience of the whole system.\nThe demand for superior edge processing power in voice-enabled devices is driving adoption of heterogeneous computing architectures \u2014 integrating diverse engines such as CPUs, GPUs and DSPs into a single system-on-chip that assign workloads to the most efficient compute engine, thus improving performance, power efficiency and cost-effectiveness to support the wide array of devices that are embracing a voice interface.\nWhile most of our interactions today are with phones popular smart speakers from Amazon or Google, as edge computing and AI become more powerful and prevalent we will see many more form factors and voice interfaces added to virtually anything, whether it is a router, an appliance or a lamp.\u00a0 It will still rely on the power of the cloud ecosystem behind it, but the devices themselves will also be a lot smarter and able to conduct many operations locally \u2013 making devices more responsive and convenient while saving time and the amount of data transferred.