IT revs up search engines

Companies are using enterprise search tools to unlock key information buried in internal databases and to boost e-commerce.

Internet search has become a staple in the daily diet of most IT professionals. Need to learn about radio frequency identification or the latest trends in offshore outsourcing? Search for it online. Want to find a JavaScript workaround? Look for user threads in a Java forum.

It don't come easy

IT executives now are applying a more sophisticated, enterprise version of search functionality to corporate Web sites and intranets to improve the search experience for e-commerce customers, business partners and employees. Beyond that, enterprise search tools are being aimed at internal databases, even databases residing on mainframes, for specialized functions such as data analytics, knowledge management and business-process management .

Christian Book Distributors (CBD) wanted to improve the search and browse functionality on its e-commerce sites, most notably , according to Mark Pepin, assistant vice president for the Peabody, Mass., company. CBD chose Endeca's ProFind for the task. "We really liked the technology that drove Endeca," Pepin says. "It was very similar to the technology we built our site on."

After they implemented the product, it wasn't long before Pepin and his team started to see how Endeca also could help CBD reduce the time it took to roll out marketing campaigns. "We saw it was also a great data-mining tool, which made it a good fit for direct-marketed, targeted e-mails to our customer base," he says.

Before using ProFind, it took several hours to run a traditional database query. "With Endeca's ability to slice and dice our data, we could load up all of our separate customer information - purchase history, author history, product categories - on a separate platform. We were then able to quickly segment the list. We could go and mine customers, clicking on anybody who had purchased a particular author in the past, and it would literally bring back information in seconds," Pepin says.

Web search vs. enterprise search

The ability to process a company's structured and unstructured data, stored in a variety of formats, is what separates enterprise search tools from more public Web search engines, according to analysts. Structured data exists in database tables, usually associated with a company's ERP, CRM or custom database systems. Unstructured data can take the form of e-mails, Microsoft Office-type files, Adobe PDFs and a host of other current or legacy file types scattered throughout a typical corporation.

Questions to ask when beginning a search project
Gartner recommends that corporations begin any search vendor evaluation project by ruling in vendors rather than ruling them out.
Does the company desire or accept an application service provider model of search provision?
Does the company desire or accept an appliance model for search provision?
Will the vendor serve one project or be an enterprise-wide default for all new projects?
What repositories of data will be searched? Will the search product call applications or simply search an index? Will text be the only significant format in which information is stored?
What level of security will be necessary, and what means of authentication will be used?
What interface will be used for result selection? Will the company desire categorical navigation? Is persuasive merchandising a goal?
What interface will be used for query input? Will the company need to use a natural question format, or stick to the familiar keyword input format?
Source: Gartner

Public Web search engines primarily support HTML file formats, and possibly a few standard office formats (Microsoft Word, Adobe Acrobat PDFs). Enterprise search products often provide gateways that let the products search and retrieve content from a range of file formats, even legacy files on mainframes.

Also, Public Web search engines use a spider to acquire new content, while enterprise search products might use either a software crawler or scripts that directly transfer files to the search engine to reduce the load on the network, according to search and retrieval guru Stephen Arnold.

Hadley Reynolds, an analyst at Delphi Group, says enterprise search is not only about a search box and a results list that appears after the user hits "Go." "Most of the enterprise search applications are looking well beyond that model into more of an integrated model," he says. Reynolds stresses that many enterprise search projects incorporate fairly comprehensive taxonomy and classification schemes to add more meaning to the content the search application unearths.

Many enterprise search vendors have begun to cross-sell components of their product suites as analytics and data-mining tools, precisely because of their ability to "slice and dice" a variety of enterprise data. And many traditional content management, CRM and ERP vendors are embedding technology from enterprise search leaders such as Verity or Autonomy.

Searching for a search tool

Companies looking for the best in enterprise search might be overwhelmed with the variety of products available and the equally varied price tags - which can range from $10,000 to more than $1 million (see graphic). Although some open source search products exist, such as Java-based Lucene, a large company undertaking an enterprise search project should be prepared to set aside an average of $250,000, according to Whit Andrews, a research director at Gartner.

Enterprise search

There are a number of platform and niche vendors to choose from, along with many new, smaller entrants. Search platform vendors tend to offer the broadest spectrum of search functionality and the most experience with building gateways or connectors to third-party applications. These include Verity, Autonomy, Endeca, Fast Search & Transfer and Convera. For search applications focused on customer self-service, Andrews also cites InQuira, Kanisa, iPhrase, EasyAsk and Kaidara Software.

Reynolds offers this list of criteria for potential buyers: security, scalability, gateway capabilities, ability to be customized and richness of the portfolio in terms of relevance approaches. (Relevance approaches seek to improve the usefulness of results returned to user queries.)

System performance and speed of retrieval are also key criteria to look for, according to Arnold. Other essential search features include stability, ease of administration, scalability, extensibility, support for common file formats (includes ASCII, Word, Adobe PDF, HTML files), role-based security safeguards, and support for indexing features, such as incremental indexing.

Making the right connections

The need for gateways to access different formats and document repositories is a major issue to large organizations, according to Reynolds, who notes that some companies might require more than 12 gateways. Andrews says that challenges with gateways are common sources of trouble for enterprise search customers. "What typically breaks are issues of security and issues of connectors to non-Web document stores. In other words, it's really easy to index all the files on a Web server but really hard to index all the pages that might come out of a database," he says.

CBD's selection process included testing in which Endeca ran demonstrations with some of CBD's data. Pepin says CBD also built sample applications.

This vetting process helped CBD make an ultimate vendor choice and led to the unexpected finding that some data had been categorized inaccurately by end users when they'd first entered it in the database. These discrepancies cropped up early in the search testing phase, when sample search queries began producing a few odd results.

"You really have to be on top of and aware of the data structures but also the anomalies. These products will show you very clearly what you are doing well, but also very clearly where you may have problems with your data," Pepin says. While CBD didn't have to correct these anomalies with any sweeping data restructuring, the company was required to perform some additional data entry work to clean up the misclassifications.

Putting in some manual effort on upfront classification and taxonomy creation can often be required to get enterprise data ready to return good results with a search engine, according to Reynolds.

Learn more about this topic


Hope is a freelance IT writer and owner of She can be reached at

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Now read: Getting grounded in IoT