OpenPipeline seeks to ease document prep for search
By Chris Kanaracus, IDG News Service
April 30, 2008 03:10 PM ET
- Share/Email
- Tweet This
- Print
Enterprise search vendor Dieselpoint is behind a new open-source project centering on a document "pipeline" -- or as the Chicago
company's CEO, Chris Cleveland, puts it, "all the boring stuff you need to make enterprise search work."
Enterprise search implementations often cover an array of document sources and components; pipelines allow companies to standardize
the processing of information before it gets pushed into a search-engine indexer.
"We're connecting the crawler companies to the text analytic companies to the search engine companies," Cleveland said.
Dieselpoint was having trouble integrating its own pipeline with third-party document analyzers and content connectors, and
has open-sourced it as a basis for the project, which is dubbed OpenPipeline.
Its Web site is scheduled to open to the public on Monday, and a fully functional version of the software will be downloadable under the
Apache 2.0 license. It is available under a commercial license as well, according to the site.
The software features a point-and-click user interface and provides a number of connectors, including Web and SQL crawlers.
It also supports a number of commercial connectors for products such as SharePoint, Exchange and a number of portals.
Dieselpoint is pursuing the project both to make bigger, more complex implementations easier and in hopes that it will draw
some customers to its search engine.
"The single biggest barrier to adoption of enterprise search is doing integration," Cleveland said. "Of course, it means enormous
consulting engagements, so it's a source of revenue for the industry, but it's a deterrent."
While major search vendors have pipelines, they are "all proprietary and all closed," he said.
A number of other vendors and consultants have signed on to the effort's advisory board. They include Alias-i, Applied Relevance
and Raritan Technologies. Cleveland is anticipating more companies will join soon.
Conceptually, an open-source pipeline makes sense for the industry on the whole "because each component is worthless on its
own," he suggested.
Guy Creese, an analyst with Burton Group, compared OpenPipeline to an existing project.
"IBM attempted to fix this issue with UIMA [Unstructured Information Management Architecture], its framework for letting multiple
vendors work together on a text analytics pipeline. However, UIMA has not done especially well in the market," he said via
e-mail. "It's unclear whether that's due to the complexity of UIMA or the fact that the market isn't quite there yet (I believe
it's the latter)."
"In short, OpenPipeline is an interesting, open-source alternative to UIMA. However, its appeal will still remain small in the market, as many enterprises aren't at the point where they need to mix
and match text analytics modules," he added.
But Cleveland countered that even basic aspects of an enterprise search implementation can involve a lot of "drudgery," which
OpenPipeline can help alleviate: "It's the simple stuff. 'Can I get [data] out of the system, add security to it and send
it to the search engine?'"
The IDG News Service is a Network World affiliate.
Partner Content
www.bmc.com
Gartner 2009 Magic Quadrant for Job Scheduling
Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.
Download whitepaper
Dell's SMART Approach to Workload Automation
Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.
Download whitepaper
Workload Automation Cost Savings 2 Minute Video
A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member. See how in this 2-minute video overview.
Go to video
Comment