How the internet of sound eliminates billions of IoT sensors

Hackathon winners cleverly avoid the billions of IoT sensor truck rolls needed to help visually impaired people navigate by using sounds instead of sensors.

How the internet of sound eliminates billions of IoT sensors
Geralt (CC0)

The Internet of Thing’s dirty little secret is the cost of deployment. For example, adding a low-cost motion sensor and radio to a traffic light to count passing vehicles before it leaves the factory is easy and inexpensive. The incremental cost of deployment is near zero, especially if low-power wide-area network (LPWAN) coverage or other low-cost communications coverage is available. But retrofitting the traffic light with sensors and radios will cost the municipality a public works truck roll, a crew, an electrician and a police traffic detail. Retrofits are expensive.

Retrofitting the world for IoT is a data science and a sensor engineering task to study the IoT problem and find the simplest way to acquire the data. The question for the data scientist is what minimum data resolution will provide the desired result: How many sensors per unit of area and how many readings per time interval are necessary to solve the problem? The engineer must trade off sensor costs, radio costs and network availability, and power consumption versus available power sources.

A winning team, Getting Around, at the Reality Virtually hackathon that the MIT Media Lab sponsored entirely avoided the cost of sensor deployment. They used background sounds to give visually impaired and the visually and hearing-impaired people accurate navigation by recognizing the sounds and reporting the location of places, such as cafes, Post Office, ATM, etc., to the user, cleverly avoiding retrofitting large areas with Bluetooth beacons. The ARound app uses Google ARCore, fusing sound, camera and geo-location data. There is no cost of deployment because nearly everyone has a smartphone. User requirements were defined and tested by visually impaired team member Sunish Gupta.

The 3 modes of the ARound app:

Discovery mode: As the user walks, GPS data combined with ARCore camera’s machine learning-based object recognition technology, senses the surroundings and records the sounds of objects and locations on the user’s path. The sounds are geofenced to an area, meaning the sound is tagged to the location.

Breadcrumb mode: As the user walks a path, ARound creates a trail of sounds — air conditioners, doors opening and closing, café conversations, etc. — making it possible to retrace the path by following the audio breadcrumbs. After the geo-fenced sound has been acquired, the sound plays as the user approaches the object or place of interest, gaining in volume as the user nears the point of interest. The sounds are anchored to the real world, accurately tracking the user’s position.

ARCore spatializes the sound in three dimensions. Sounds present differently, depending on the listener’s position and orientation to the sound. The ARCore 3D sound machine learning model authentically recreates the sound to match the user’s position. Hearing impaired people could “hear” the sound through the vibration of a haptic device, such as the Moment, or even the vibration of a mobile phone.   

Share mode: Sounds from a library of audio files could be linked to locations using a web application, similar to posting photos to a location on Google maps. This creates a sound pin on the map that is triggered when the user approaches the location. Using sound pins, a meeting location with a visually impaired person could be set. Drop an audio file at the meeting location to set the location to make it accessible for a visually impaired person. There is also a social multiplier similar to photos added by users to Google Maps. A repository of geo-locations and sounds could create a high-resolution landscape of sounds for navigation and other purposes.

The ARound development team wants to take this minimally viable product (MVP) into a full-scale platform for different user types, not just the visually impaired, for navigation and for the unimpaired to engage in fun everyday experience using the acquired sound landscape. The platform would use Google ARCore and/or Apple ARKit, both machine learning-based development tools, to use augmented sound for spatial direction. The team — Anandana Kapur, Dhara Bhavsar, Sunish Gupta, Vedant Sarant and Maxym Nesmashny (bios) — would like the project to gain the support of open source contributors. Contributors interested in assistive technologies can find the project on GitHub.

Some future development ideas discussed with the developers include adding a machine learning model to identify and categorize sound. Like machine learning can categorize people and objects in images, a machine learning model could identify and categorize the objects and locations represented by sounds. Crowd sourcing geo-fenced sounds is another. People who want to contribute geo-fenced sounds could run a background app on their phones to collect them, like the 5 million users who contribute spare compute cycles to the Seti@Home project to search for extraterrestrial intelligence.

The fields of IoT augmented reality and machine learning are inextricably linked and are likely to become subfields. I considered them discrete fields until I attended the AR in Action Summit last January. It was a firehouse of TED conference-style short talks by practitioners and researchers and multidisciplinary panels from far-ranging fields such as surgery and architecture.

Prior to the Summit, I categorized IoT as ubiquitous sensor networks, AR to be holographic headsets like the Hololens, and machine learning as natural language processing and image recognition. At the end of the conference, it was clear that those are closely related tools. Any machine that gives context augments reality. It could be a holographic headset that projects 3D virtual objects into reality, but it could be a haptic band that reacts to the environment or an app like Around that uses IoT, machine learning and AR.

The definition of IoT should become narrower and broader. One day, the internet of everything will come, but only after every product is designed with sensors that feed the vision of derived context and understanding of every point in the physical world. This vision is impossible to achieve without billions of truck rolls making it economically infeasible. Clever avoidance of truck rolls during this retrofit stage of IoT to minimize or entirely eliminate retrofitting the world with sensors will provide the derived context and understanding of most of the physical world.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10