• United States

Recommind aims to pinpoint keyword searches

Feb 26, 20033 mins
Enterprise Applications

Searching reams of unstructured data can be the bane of Web-based existence. Recommind’s MindServer 2.1 release looks to help ease some of the pain of keyword-searching of large volumes of data by automatically categorizing results based on concepts and relationships found in the collection being queried.

Version 2.1, announced Wednesday, adds text extraction features that can help pull people, dates, places and company names out of documents, further helping analyze the results of a query, says Bob Tennant, CEO of Recommind.

“We can now not only recognize the subject matter of an article, but also the people and places referenced,” Tennant says. “This functionality can help create a management information system that can be used to reference who is involved in a project or to provide information on where an event is happening.”

MindServer works by first analyzing all the documents and files that need to be index and algorithmically determining relationships between different documents. Meta information on these relationships is stored in a proprietary database system. To get the people, places and date extraction capability, the company uses a statistical model as well as pre-trained the system to identify the various entity types.

With MindServer, a search on the word “Java” for instance could bring back results categorized around the programming language, the island nation and coffee, rather than a long list of documents with the word “Java” in them.

The Research Library Group (RLP) in Mountain View, Calif., is currently testing version 2.1 with 10% of the data from its 40 million-title card catalog collection. The company licenses its catalog mainly to higher education institutions like Stanford University for use on their campuses.

“The issue we’re trying to solve is being able to search using naive words and still get good results,” says Jim Michalko, CEO of RLP. “We want to give results back based on an authoritative vocabulary despite a query based on naive terms.”

Searching a library card catalog is never an easy task and is made more difficult by end users becoming more accustomed to Web search methods. “Most of our clients are used to interrogating data like the do on the Web – with a search box,” Michalko says. “It’s not the same with a library catalog, but we’re trying to make it more like the Google experience.”

One thing Michalko likes about the system so far is its ability to search non-English language documents with great accuracy. Being able to return every catalog entry, regardless of the native language, is a great feature for the scholarly types RLP serves.

Version 2.1 of MindServer is now available and runs on Linux, Solaris and Windows. Pricing starts at $75,000 per server processor.