Americas

  • United States

Going deep with Deep Web

Opinion
Aug 09, 20042 mins
Enterprise Applications

* Deep Web maps the data presented by the Web interfaces of databases

While we’d all like to be able to integrate our databases with our Web applications by simply buying a utility, the reality is that you rarely have a clean database interface to work with. The lack of structure and variability of returned data formats can make repurposing retrieved content very tricky. This is the problem that Deep Web Technologies addresses with its Explorit product (see editorial links below).

Deep Web’s products in effect map the data presented by the Web interfaces provided by one or more databases or other search tools (such as Verity) and normalize the results into a consistent Web output format.

The core product, Explorit, provides an interface to a single database, but where Deep Web’s technology becomes really interesting is Distributed Explorit, which allows multiple databases to be integrated.

Now you might be wondering why you couldn’t do the same thing with, for example, an XSLT processor. The answer is that you might be able to but defining the interface to the database and the variety of translations required would be complex and the results would be undoubtedly slower in operation. As Deep Web’s President Abe Lederman, told me: “If databases behaved better our job would be easier.”

Deep Web has also created a “results ranking” system that grades the results from multiple databases accessed in parallel to provide a more powerful relevance determination.

Another Deep Web product, Explorit Alerts, runs queries on schedule and provides a report of just the new and changed items while Explorit Crawler spiders a list of pre-defined sites for relevant content. There are two further variations of the crawler: Explorit Focused Crawler, which incorporates a thesaurus, and Explorit Subject Crawler, which provides a browsing interface to information collections.

Deep Web has quite a roster of clients, including Science.gov and the Environmental Science Network.

These products can run under Sun Solaris 2.3+ and Microsoft NT 4.0+, and Linux is under development (HP-UX, AIX, IRIX and other Unix versions by request are all custom implementations). Project pricing starts at about $25,000.

mark_gibbs

Mark Gibbs is an author, journalist, and man of mystery. His writing for Network World is widely considered to be vastly underpaid. For more than 30 years, Gibbs has consulted, lectured, and authored numerous articles and books about networking, information technology, and the social and political issues surrounding them. His complete bio can be found at http://gibbs.com/mgbio

More from this author