A system created by Google researchers automatically wrote the caption on the picture above this post.
Normally, I get paid to perform that function, at least on this blog.
Maybe this isn’t such great technology (he writes jokingly).
From a post on the Google Research Blog:
People can summarize a complex scene in a few words without thinking twice. It’s much more difficult for computers. But we’ve just gotten a bit closer -- we’ve developed a machine-learning system that can automatically produce captions … to accurately describe images the first time it sees them. This kind of system could eventually help visually impaired people understand pictures, provide alternate text for images in parts of the world where mobile connections are slow, and make it easier for everyone to search on Google for images.
Recent research has greatly improved object detection, classification, and labeling. But accurately describing a complex scene requires a deeper representation of what’s going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language.
The blog post and this Cornell University research paper get into the details of how this system works.
This collection of examples demonstrates the limitations of this caption-writing technology, and, one would hope, the continued necessity of caption-writing editors. Note that the results range from spot-on in the left-hand row to way off the mark on the right.
Despite the obvious implications for diminishing the value of my skill set, it’s still pretty cool stuff.
However, Chris Messini, who previously worked at Google -- and is credited with creating the hashtag -- felt compelled to interject an AI-related concern in comments: “Yeah, so computers can talk (Siri), and they can see (self-driving cars) — now they can describe full scenes or complex objects and people. There’s nothing to be worried about here as long as they don’t start lying to us about how many pizzas were actually in the oven.”