Google AI expert explains the challenge of debugging machine-learning systems

Peter Norvig: 'The methodology for scaling [machine learning verification] up to a whole industry is still in progress.'

Google AI expert explains the challenge of debugging machine-learning systems

Google Director of Research and renowned artificial intelligence (AI) expert Peter Norvig, presented an entirely different side of AI and machine learning at the EmTech Digital conference. He compared traditional software programming to machine learning to highlight the new challenges of debugging and verifying systems programmed with machine learning do what they are designed to do.

Traditional software programming uses Boolean-based logic that can be tested to confirm that the software does what it was designed to do, using tools and methodologies established over the last few decades.

In contrast, machine learning is a black box programming method in which computers program themselves with data, producing probabilistic logic that diverges from the true-and-false tests used to verify systems programmed with traditional Boolean logic methods.

machine learning Google

Norvig summed up the status of machine learning verification compared to traditional programming:

“The problem here is the methodology for scaling this [machine learning verification] up to a whole industry is still in progress. We have been doing this for a while; we have some clues for how to make it work, but we don’t have the decades of experience that we have in developing and verifying regular software.”

Why use machine learning if debugging it is so difficult? Despite the current limits to its verification, machine learning has the advantage of development speed. Complex systems solutions to certain types of problems, such as voice recognition or classifying images, can be built one, two or even three times faster than traditional programming methods. For example, Nvidia engineers programmed an autonomous car prototype with about 100 hours of training data. The productivity of applied machine learning is so compelling that developers have to use it, calling for new methods of verification.

The starting point divides the risk and dangers that are inherent to the problem from the technology used to create a solution to the problem. Only the risks and dangers inherent in the technology solution can be controlled.

+ More on Network World: 13 frameworks for mastering machine learning +

The condition, called non-stationarity, affects both traditional programming and machine learning. Non-stationarity means that over time, conditions change and the systems designed to work under these conditions become less effective. In traditional programming, a new release can be developed that can be tested and verified using proven processes prior to shipment.  

This verification has been lost with the shift to machine learning because it doesn’t fit traditional programming technology’s step-by-step process: develop, test and release. Data is continuously produced and acquired by a machine-learning system, reprogramming the system and making the step-by-step approach impractical.

Machine learning test assertions

Norvig explained his idea for managing the machine learning verification problem. Instead of traditional test suite assertions that respond with true, false or equal, machine learning test assertions should respond with assessments, such as the results of today’s experiment were 90 percent good and consistent with tests run yesterday.

Compounding the verification problem, the truth used to verify the product of a machine-learning system may not be known or distorted by human perception. Imagine if the machine-learning system were designed to determine if a dress were gold and white or blue and black, a task that humans’ recently couldn’t agree on.

Norvig explained, “For some problems, we just don’t know what the truth is. So, how do you train a machine-learning algorithm on data for which there are no set results?”

The recourse in these situations, according to Norvig, is an unbiased method of determining the answer, such as a panel of judges.

Machine-learning systems trained on data produced by humans with inherent human biases will duplicate the bias in the models. A method of measuring what these systems do compared to what they were designed to do is needed to identify and remove bias. 

Traditional software is modular, lending to isolation of the input and outputs of each module to identify which one has the bug. In machine learning, though, the system has been programmed with data. Any bug will be replicated throughout the system. Changing one thing changes everything. There are techniques for understanding that there is an error, and there are methods for retraining machine-learning systems, but there isn’t a way to fix just one isolated problem.  

To paraphrase Norvig, a better set of tools is needed. The entire tool set needs to be updated to move forward.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10