IBM Watson taps university smarts ahead of Jeopardy! battle

IBM Watson collaborates with MIT, Carnegie Mellon, University of Texas, University of Southern California and others to bolster question answering ability

If you have seen any of the video of its preliminary bouts on Jeopardy! you know that IBM's Watson computer is pretty amazing.  One of the main reasons it turns out is that IBM enlisted the intelligence of eight of the country's top universities to make sure Watson has superb question answering ability.

The weirdest, wackiest and stupidest sci/tech stories of 2010 

Technology research from the schools, the Massachusetts Institute of Technology (MIT), University of Texas, University of Southern California (USC), Rensselaer Polytechnic Institute (RPI), University at Albany (UAlbany), University of Trento, and University of Massachusetts, and Carnegie Mellon University will  help advance Watson's ability to understand all kinds of industries, such as healthcare, banking, government and more IBM said.

Watson, named after IBM founder Thomas J. Watson, is programmed to rival the human ability to answer questions posed in natural language with speed and accuracy, IBM stated. Watson's software is runs on IBM POWER7 servers optimized to handle the massive number of tasks it must perform at rapid speeds to analyze complex language and deliver correct responses to Jeopardy! clues.

"Applying QA technology to the real-time Jeopardy! problem is an important challenge for the field because it requires a system to respond more quickly and with a level of confidence that has not been possible to-date," says Professor Eric Nyberg, of CMU in a statement.  "Jeopardy! requires forms of reasoning that are quite sophisticated, using  metaphors, puns, and puzzles that go beyond basic understanding of the language. As a challenge problem, Jeopardy! will stretch the state of the art." (For an interesting look at the engineering behind Watson, check out this Mashable story)

According to IBM, the following universities and what they are contributing include:  

Carnegie Mellon University: Assisted IBM in the development of the Open Advancement of Question-Answering Initiative (OAQA) architecture and methodology. CMU also made two direct contributions to Watson: a source expansion algorithm which identifies the best text resources for answering questions about given topic, and an answer-scoring algorithm which improves Watson's ability to recognize when a candidate answer is likely to be correct. 

Massachusetts Institute of Technology: Pioneered an online natural language question answering system called START, which has the ability to answer questions with high precision using information from semi-structured and structured information repositories. The underlying contribution to the Watson system is the ability to break down the question into simple sub-questions for responses to be quickly collected and then fused back together to come up with an answer. The Watson system architecture also took advantage of the object-property-value data model pioneered by MIT, which enables the information in semi-structured data sources to be effectively retrieved in response to natural language questions.

University of Southern California: Focused on large-scale Information Extraction, Parsing, and knowledge inference technologies with the goal of converting large amounts of international source materials into the general knowledge resources of the system, and reasoning with this knowledge to find inconsistencies and gaps. 

University of Texas at Austin: Worked to extend the capabilities of Watson, with a focus on extensive common sense knowledge. The goal is to help the system answer questions by developing a computational resource of common sense knowledge. In particular, they have developed methods that learn to extract knowledge from text, a key requirement for the Watson system.

Rensselaer Polytechnic Institute: Worked on a visualization component to visually explain to external audiences the massively parallel analytics skills it takes for the Watson computing system to break down a question and formulate a rapid and accurate response to rival a human brain.

University at Albany:  When investigating a complex topic, you rarely receive the answer you need by asking just one question; rather you ask a series of questions to determine the solution. This technological advancement enables a computing system to remember the full interaction, rather than treating every question like the first one - simulating a real dialogue.  While not applicable for the specific Jeopardy challenge given the nature of the quiz format, IBM is working with UAlbany to integrate this capability into the Watson system.

University of Trento (Italy): The aim of their ongoing collaboration with IBM is to explore advanced machine learning techniques along with rich text representations based on syntactic and semantic structures for the optimization of the IBM Watson system. The team has developed technology based on the latest results of the statistical learning theory applied to natural language understanding. This has already increased Watson's ability to learn from the questions it is asked. Learning to handle the uncertainty in the selection of the best answer from those found by Watson's search algorithms also has been one of their main research directions, IBM stated.

University of Massachusetts Amherst: Working on information retrieval, or text search. This important capability of QA technology is the first step taken: looking for and retrieving text that is most likely to contain accurate answers. The system's deep language processing capabilities then analyze the returned information to find the actual answers within that text.

IBM is pitting its natural language Watson supercomputer against two of the quiz show Jeopardy!'s biggest champion players in a $1 Million man v. machine challenge on February 14, 15 and 16. The Jeopardy! format provides the ultimate challenge because the game's clues involve analyzing subtle meaning, irony, riddles, and other complexities in which humans excel and computers traditionally do not, IBM stated.

The system incorporates a number of proprietary technologies for the specialized demands of processing an enormous number of concurrent tasks and data while analyzing information in real time, IBM stated.

IBM has been prepping Watson for the show.  The system played more than 50 "sparring games" against former Jeopardy! Tournament of Champions contestants and Watson has taken and passed the same Jeopardy! contestant test that humans take to qualify to play on the show. 

Watson will compete against the show's two most prolific past winners -- Ken Jennings and Brad Rutter.  Jennings broke the Jeopardy! record for the most consecutive games played by winning 74 games in a row during the 2004-2005 season, resulting in winnings of more than $2.5 million.  Rutter won the highest cumulative amount ever by a single Jeopardy! player, earning $3,255,102.

The grand prize for this competition will be $1 million with second place earning $300,000 and third place $200,000. Rutter and Jennings will donate 50 percent of their winnings to charity and IBM will donate 100 percent of its winnings to charity.

Follow Michael Cooney on Twitter: nwwlayer8   

Layer 8 Extra

Check out these other hot stories: 

Thought police? DARPA wants to know how stories influence human mind, actions

NASA's investigation of Toyota problems may force electronics changes

IBM, in mobile security drive, teams with Juniper on smartphone protection services

Out of control: Giant US electronic records project

U.S. Energy lab nabs 10-petaflop IBM supercomputer for future research

US tries to fire-up mighty offshore wind energy projects

U.S. sacks counterfeiters in massive sting

Pentagon sets the tone for future outer space exploration

Can you design a tank? Well you could win $10,000

NIST puts one more nail in the Mercury thermometer coffin

Insider Tip: 12 easy ways to tune your Wi-Fi network
Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies