Skip Links

Network World

Glenn Weadock

Learning to Crawl (apologies to the Pretenders)

By Glenn Weadock on Mon, 11/17/08 - 3:13pm.

Crawling in Search Server 2008 Express is the process of building the index based on a highly specific set of criteria and parameters. When creating a new content source, you can tell the program the starting location (URLs) of the content you want to crawl; the type of content to crawl (SharePoint sites, Web sites, file shares, or Exchange public folders); whether to perform a full or incremental crawl (incremental just crawls content that has changed since the previous crawl); and when to perform the crawl procedure.

But that's not all. The Search Administration page also has a link for "Crawl rules," which allow you to specify paths to include and/or exclude (wildcards are allowed). You can also define the security context for the crawl process; the default account is NT AUTHORITY\LOCAL SERVICE, but you can specify alternative credentials, a client certificate, or a cookie. (Microsoft notes that credentials you specify for the security context will be transmitted in clear text.)

You can specify proxy server settings to use when crawling other servers. Create "crawler impact rules" in order to configure how many documents to request simultaneously (reducing the number reduces the impact on the server holding the documents). If you really want to ease the burden, you can tell Search Server 2008 to request only one document at a time, and count to ten (or some other number) before requesting another one. Finally, on the "Manage File Types" page, you can specify which file types to include in the crawl operation. On my system, the default list was woefully inadequate; I can see an administrator needing hours, not minutes, to properly configure this page to include the breadth of common file types. Plus, there seems to be no provision to specify "all suffixes" here, which seems a very significant omission (it's also a major problem with Microsoft's desktop search technologies). But perhaps there's a workaround? I will "search" for a solution.

Aside from the wimpy file-type management capability, the crawling options seem pretty thorough. Next time, I'll take a closer look at the other configuration options for this purportedly enterprise-class search tool. Meanwhile, for some reason, I feel a need to hear "Back on the Chain Gang" on the stereo. See you on the Interweb.

Recent posts

Installing Search Server 2008 Express, Episode Two

Installing Search Server 2008 Express, Episode One

IT in the Cabinet?

About Glenn Weadock on Windows Server 2008

Glenn Weadock is a longtime instructor for Global Knowledge and teaches Windows 7, Server 2008, and Active Directory. He has recently co-developed with Mark Wilkins two advanced Server 2008 classes in the Microsoft Official Curriculum. Glenn also consults through his Colorado-based company Independent Software, Inc. and is technical director of MarketCoach Investment Education Software LLC.

Global Knowledge

 

Most Discussed Posts