Researchers at the University of Maryland last week said they developed a system that let robots watch a series of “how to” cooking videos on YouTube and then use the tools shown in those films to perform tasks.
“Based on what was shown on a video, robots were able to recognize, grab and manipulate the correct kitchen utensil or object and perform the demonstrated task with high accuracy—without additional human input or programming,” the researchers said.
+More on Network World: World’s coolest gas stations+
Another significant innovation to come out of the study is the robots’ ability to accumulate and share knowledge with others. Current sensor systems typically view the world anew in each moment, without the ability to apply prior knowledge.
“This system allows robots to continuously build on previous learning—such as types of objects and grasps associated with them—which could have a huge impact on teaching and training,” said Reza Ghanadan, program manager from the Defense Advanced Research Projects Agency’s, Defense Sciences Offices. DARPA’s Mathematics of Sensing, Exploitation and Execution (MSEE) program funded the University of Maryland’s research.
DARPA says the goal of the MSEE program is to “develop high-impact methods for scalable autonomous systems capable of understanding scenes and events for learning, planning, and execution of complex tasks.”
“Instead of the long and expensive process of programming code to teach robots to do tasks, this research opens the potential for robots to learn much faster, at much lower cost and, to the extent they are authorized to do so, share that knowledge with other robots. This learning-based approach is a significant step towards developing technologies that could have benefits in areas such as military repair and logistics,” Ghanadan stated.
Robots can learn to recognize objects and patterns fairly well, but to interpret and be able to act on visual input is much more difficult.
“The MSEE program initially focused on sensing, which involves perception and understanding of what’s happening in a visual scene, not simply recognizing and identifying objects,” said Ghanadan. “We’ve now taken the next step to execution, where a robot processes visual cues through a manipulation action-grammar module and translates them into actions.”
From a paper the University of Maryland researchers published on their work:
“This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network-based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by ‘watching’ unconstrained videos with high accuracy.”
Check out these other hot stories: