Americas

  • United States

dtSearch spiders and searches Web

Opinion
Jul 02, 20032 mins
Enterprise Applications

* dtSearch Web

When you are using a Web server to distribute lots of content you usually have the challenge of how to make that content searchable. While you can use human-powered index building there are always risks associated with this. For example, people make mistakes and leave things out or put them in the wrong place or the index doesn’t use the terms that make the thing the user is searching for findable.

An interesting solution is dtSearch Web from dtSearch (see links below).

DtSearch describes its tool as an “out-of-the-box solution for publishing instantly searchable documents in both Web-ready and other formats.” Based on the company’s dtSearch engine, dtSearch Web can search many format types including Microsoft Office documents, e-mail, HTML, PDF, XML, ZIP, CSV, RTF, ANSI, and Unicode. HTML and PDF content hits can be highlighted while keeping embedded formatting and links intact.

DtSearch Web uses “natural language algorithms” to provide term weighting based on the frequency of hits in unstructured search requests. The product also supports variable term weighting for indexed searches to extra emphasis on one or more words. For example, positive weightings such as “soup:8” or “recipe:3” as well as Negative weighting such as “yellow:-7.”

Running on IIS 4 or 5 under Windows NT, 2000 and XP (a Linux version is scheduled for release this year), dtSearch is setup by a simple configuration wizard and supports an interface for Active Server Pages and includes a sample ASP application. For other languages there’s also a full dtSearch Text Retrieval Engine programming API.

DtSearch also offer the dtSearch Spider which is designed to spider remote sites index and searching of both public and secure content (HTTPS sites and password-accessible sites), in addition to publicly available sites. Content on Web sites can be indexed to any level of page depth.

DtSearch Web bundled with dtSearch Spider is priced at $999 per server and 3-server packs are available for $2,500.

mark_gibbs

Mark Gibbs is an author, journalist, and man of mystery. His writing for Network World is widely considered to be vastly underpaid. For more than 30 years, Gibbs has consulted, lectured, and authored numerous articles and books about networking, information technology, and the social and political issues surrounding them. His complete bio can be found at http://gibbs.com/mgbio

More from this author