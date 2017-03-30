Google AudioSet aims to make sounds, from roars to boings, searchable

Google researchers hope AudioSet could be "a starting point for a comprehensive vocabulary of sound events"

Google AudioSet aims to make sounds, from roars to boings, searchable
Credit: Thinkstock
Related

Google researchers have released a collection of 2 million-plus labeled audio snippets designed to spark innovation in the area of sound search.

The company earlier this month published a paper titled "AudioSet: An ontology and human-labeled dataset for audio events" that it hopes will combine with image recognition to strengthen overall search and identification capabilities that could be used in a wide variety of machine learning applications, including the automation of video captions that include sound effects. Google began work on the project last year.

Google has exploited its YouTube business to collect 2 million ten-second YouTube excerpts (totaling 5.8 thousand hours of audio) labeled with more than 500 sound categories to create its AudioSet. Categories start at high levels such as Human Sounds and Music, and then get more specific, such as Whistling and Music Genre.

Dan Ellis, Google research scientist, explains in a blog post that "We decided to use 10 second sound snippets as our unit; anything shorter becomes very difficult to identify in isolation. We collected candidate snippets for each of our classes by taking random excerpts from YouTube videos whose metadata indicated they might contain the sound in question (“Dogs Barking for 10 Hours”). Each snippet was presented to a human labeler with a small set of category names to be confirmed (“Do you hear a Bark?”). Subsequently, we proposed snippets whose content was similar to examples that had already been manually verified to contain the class, thereby finding examples that were not discoverable from the metadata."

Ellis adds: "By releasing AudioSet, we hope to provide a common, realistic-scale evaluation task for audio event detection, as well as a starting point for a comprehensive vocabulary of sound events."

MORE: Hackers could use hidden mal-audio to attack Google Now

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:

Bob Brown is a news editor for Network World, blogs about network research, and works most closely with our staff's wireless/mobile reporters. Email me at bbrown@nww.com with story tips or comments on this post. No need to follow up on PR pitches via email or phone (I read my emails and will be in touch if interested, thanks)

Must read: 10 new UI features coming to Windows 10
You Might Like
Don't Miss
linux laptop stickers
Being a Linux user isn't weird anymore

In places normally filled with glowing Apple logos and Windows laptops, Linux users are becoming more...

man with online security key
IBM on the state of network security: Abysmal

IBM says cybercriminals are starting to grab unstructured data, spam has rebloomed 400% and ransomware...

free tech software storage
18 free cloud storage options

A review of 18 companies that offer free cloud storage

BrandPosts
Learn more
Resources
Top Stories
russian hackers
Senator: Russia used 'thousands' of internet trolls during US election

The Russian government used "thousands" of internet trolls and bots to spread fake news, in addition to...

data center storage
4 ways to contain IT storage creep

Everyday we create the problem of IT storage creep. If we produce the problem, though, we can solve it....

01 light
7 tips to strengthen online security

If you are online today, checking email, buying someone a gift, posting to Facebook, paying bills,...

bad boss tech jobs underfoot squash employee worker
How to become a horrible IT boss

Good-bye, programming peers; hello, power to abuse at your whim