Stefanie Tellex, assistant professor of computer science at Brown University, is solving a thorny robotics problem: robotic grasp. She has built a machine learning model so that robots can automatically learn to manipulate objects and can produce much-needed sample data with which other researchers can use to train robots to pick up objects, she explained at the MIT Technology Review’s EmTech conference.
Video Credit: MIT Technology Review
Opening her energetic and passionate talk, Tellex stated the problem:
“Most robots cannot pick up most objects most of the time. If you go to a robotics lab and put an object in front of a robot that it has not seen before, that robot will almost always not be able to pick up that object.”
It’s a problem because a robot has to understand the task and the object from sensor information. The robot arm’s controls need answers to important questions: what is the object shape, where is it, how should the robotic arm and gripper move into position and where is the right place to grip the object to pick it up?
Researchers have programmed robots to pick up specific objects, using hard-coded routines that work only with a small set of objects. A generalized solution that will control robot grasp through the cycle of picking up any unfamiliar object and moving it has not yet been developed.
The machine learning model that Tellex built lets her lab robot learn by its mistakes, through trial and error until it successfully grasps the object. Trial and error learning produces a data set of images of the object, the interaction of the gripper with the object and failed and successful attempts.
The byproduct of the robot’s exercises is a data set that becomes machine learning sample data to train other robots to pick up similar objects. But the Baxter robot in Tellex’s lab will not produce enough sample data fast enough to have an impact. To resolve this problem, Tellex launched the Million Object Challenge to recruit all of the labs that own the 400 Baxter research robots around the world to run her machine learning model on their idle robots to acquire the data set of a million objects.
Why sample training data is important
Other machine learning applications such as self-driving cars and image understanding have progressed rapidly in recent years because of the wealth of sample data available in these fields to train machine learning models. The reason why robots lag is because of the scant amount of sample data to train the neural network that guides robotic grasp.
The most widely understood example of the use of labeled data to train machine learning models is image recognition. Every year, the accuracy of image recognition improves—measured by the results of the annual ImageNet competition, currently standing at the top of the 90th percentile. The improvement can be attributed to the shift to deep machine learning and very large sample data sets. Millions of labeled image sets are used in academic research, and if Google, Facebook and other online image repositories are counted, there are billions of images.
How neural networks work
A neural network is a computing system made up of several simple, highly interconnected processing elements that process information by their dynamic state response to external inputs. It is trained to understand application-specific cases by processing large amounts of labeled data. An image of a bird is labeled bird, an image of a car is labeled a car, etc. A very large sample of images is reduced to pixels and processed using general-purpose machine learning software such as Torch or Tensorflow, running on a neural network to train it to recognize objects in photos.
The input layer in this case is a large set of labeled images; the output layer is the label describing the image such as car and not car. The hidden layer of processing elements, commonly referred to as neurons, produces intermediate values that the machine learning software processes through a generic learning algorithm, associating the intermediary values called weights with the images of cars with the label.
Then the sample data is reprocessed but without the labels to test the accuracy of the model in predicting the label. The results are compared, and the errors are corrected and fed back into the neural network to adjust how the algorithm assigns weights. Error correction iterates until the probability of a correct prediction is optimized.
Self-driving cars learn to drive the same way. The sample data set is the recording of 3D video of the road ahead and the steering angles captured while a person drives a specially equipped car. This log becomes the input into the machine learning system that trains the algorithm to correctly apply steering angles to the video. This process is iterated until the driving model is accurate for steering a vehicle.
The repository of sample images to train image recognition systems is enormous. And self-driving cars have large sample data sets. Google has driven its self-driving cars over a million miles, and Tesla nearer a hundred million miles, to create enormous amounts of sample data labeled with wheel angles. No equivalent data set exists to train robots to pick up objects.
The Million Object Challenge
Tellex’s Million Object Challenge, if successful, will produce sample data for more academic research and one day produce accurate robot grasp in the same range as image recognition. She explained other challenges, such as standardizing the data and applying it to different types of grippers and robotic hands though a household robot like the Jetson’s Rosie, is far out into the future.