Skip Links

New search engine takes 'DeepDyve' into the Dark Web

By Heather Havenstein, Computerworld
November 12, 2008 10:54 AM ET
  • Print

DeepDyve Tuesday announced that it has launched a free search engine that can be used to access databases, scholarly journals, unstructured information and other data sources in the so-called "Deep Web" or "Dark Web," where traditional search technologies don't work.

The DeepDyve search engine enables searches of the Dark Web to more easily find data life sciences, patent, and Wikipedia data. The new engine indexes 500 million pages, said DeepDyve, which was known as Infovell before changing its name on Tuesday.

The company said it will soon start indexing physical sciences content in the areas of information technology, clean technology and energy - which will help it meet its goal of growing its index to more than 1 billion pages by the end of the year.

Because much of the content on the Deep Web is made up of technical publications, databases, scholarly publications and unstructured data, it has been difficult for traditional search engines to access it. To tackle this issue, DeepDyve is partnering with those publishers to gain access to content overlooked by other engines, the company added.

Google announced earlier this month that it is ratcheting up its aim at the Dark Web by adding the ability to search PDF documents. In April, Google had announced that it was trying to find a way for its search engine to index HTML forms such as drop-down boxes and select menus that are typically part of the Dark Web.

"According to IDC, more than 42 million consumers spend 25 hours per month online researching business and personal information, and they are frustrated with the results they get back and the tools they have to use," said William Park, CEO of DeepDyve, in a statement. "DeepDyve gives information-savvy consumers unparalleled access to quality information found only in the Deep Web, with features and functionality that make it easy to find, filter and organize their results."

The company's technology is aimed at allowing users to type in a few words or copy an entire article into a query box to find all related articles located in the Deep Web, DeepDyve noted.

Chris Sherman, a blogger at Search Engine Land, said that DeepDyve's approach to scouring the Deep Web is innovative. He credited the company's chief scientists, who are veteran genomics researchers. To crack the genetic codes contained in DNA sequences, he noted, researchers must understand the hidden patterns in of massive amounts of data.

"DeepDyve takes a similar approach to understanding information on the Web," Sherman added. "Going far beyond basic keyword-based search, DeepDyve indexes every word in a document, but also computes the factorial combination of words and phrases in the document and uses some industrial strength statistical techniques to assess the 'informational impact' of these combinations. In essence, this approach looks at the meaning of an entire document and uses that to compute relevance, rather than factors like snippets of text or anchor text in links pointing to documents."

  • Print
What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?

Videos

rssRss Feed