- Is the Cisco MARS mission going to abort?
- First iPhone worm spreads Rick Astley wallpaper
- 10 stunning 3D buildings made with Google SketchUp
- Open source software ready for big business
- Four reasons to buy (and one reason to avoid) the Droid
The Google Search Appliance packages up the company's famously accurate technology into an easy-to-use search engine for intranets and public-facing corporate sites. In our Clear Choice test of the GB-1001 model, we found that while the searching and indexing features live up to the Google name, the product lacks polish and advanced management features.
The appliance's honeycomb case caught our eye, but the whimsy wore off as we began to notice occasional unevenness in the appliance. For example, the appliance takes a number of minutes to start up and run its various system checks. To alert you it is done, it plays a little tune. In testing in our server room and at a collocation facility, we couldn't hear the tune over the dull roar of such environments and had to manually probe for the system's state.
The GB-1001 does not provide obvious light indicators or a small LCD screen on the unit. No on-off switch is provided, as the designer likely intended you to go through the proper shutdown procedure. We experienced an unplanned UPS failure, and upon power restoration the box recovered properly once it performed an automated rebuild of its RAID system that lasted several hours. After you do trigger shutdown through the Web administration system provided, you need to be careful not to cut power too early; otherwise, you will have the RAID rebuild wait on your hands.
We also found other polish points lacking. Within the administration system, confirmations of configuration changes didn't appear in a logical place, form fields were slightly misaligned or oddly arranged, warning messages did not appear reliably, help information was too concise or lacked good examples, result output previews didn't always work, and, in some cases, error messages lacked detail.
There were some bright spots, including clear installation documentation, color-coded cables and a built-in DHCP server that allowed us to plug in a laptop and quickly configure the network settings.
Using a Web-based GUI, your first step after installation would likely be to define a search index by indicating starting URLs, URL patterns and file types that should be recorded and discarded by the crawler. (see "How we did it" ).
According to Google, the crawler is capable of indexing 220 types of content. In our test we saw no limitation in the crawler, and found that the device tended to discover files that we were not aware of in some test data sets.
You will likely want to break up the indexed documents into different collections based upon a URL pattern. The GB-1001 allows for an unlimited number of collections.
The crawler is quite adept at dealing with secured content. It handles Secure-HTTP connections and can negotiate basic authentication, NT LAN Manager authentication, and custom cookie and form-based access. The GB-1001 can crawl content from databases, including Oracle, SQL Server, mySQL, IBM DB2 and Sybase. If you happened upon a data type the crawler cannot access, you can feed it directly to the device in an XML format.
Google does limit its appliances by document count starting with 500,000 for the base unit (for smaller deployments, use the Google Mini ). You can of course increase your license and associated hardware to build out a search infrastructure that could support millions of documents. When you size your appliance be aware that if you plan on doing direct database indexing, Google will count each record as a document, so you might chew up a license very quickly.
One aspect of the crawl process that we especially liked was the diagnostics facility, which was not only useful to understand what the crawler was doing, but it also clearly helped us isolate such indexing problems as broken links, server issues and access-denied problems.
The GB-1001 provides a great deal of flexibility for the search page and result listings. Some administrators may be happy to use the page layout helper and modify the logo and basic aspects of the search page. However, most folks will probably want to modify the results to fully integrate it into the look and feel of the site. If you are familiar with XML Stylesheet Language Transformation you can modify a near-3,000-line template that controls just about every aspect of the search form and result. If this doesn't suit you, just use the raw XML returned from the appliance and do whatever you like, including putting it into another system.
Google's approach is to implement searches in an easy-to-use "black box" fashion, which could place constraints on a private search. You turn the appliance loose, and it ranks based upon the Google algorithm. We were pleased that the accuracy of the test search lived up to what we see in everyday use of the Google Internet search. It easily found buried test phrases and correctly identified primary documents.
The GB-1001 provides features to massage the results; unfortunately, some are a bit limited or not well documented. The most valuable feature for search customization is the KeyMatch configuration, which allows you to define keywords, phrases and exact queries. The latter returns up to three matches, or five if you dig to find out about a setting change. The Synonym setting provides a useful way to suggest alternate search terms triggered by the original query. It is also possible to create filters against the domain in which a document is found, the language a document is written, the file type it was created or the meta tag it was given. The meta tag facility, if carefully applied, can provide a rich system to slice indexed data in a variety of ways: by author, owner, or rating, for example.
Various front-end and search-result features we tested took an unpredictable length of time to register our changes. If you add synonyms, keyword matches or a variety of other template changes, you typically can't see the result right away. You must be patient if you like to tinker.
In terms of performance, the GB-1001 appliances start at around 300 queries per minute (vs. the Mini's rate of 60 queries per minute [see story ]). Our test verified that the Google Search Appliance unit was roughly four times faster than the lower-end unit. We were able to increase response time past 1 second per query under heavy load well beyond 300 queries per minute, but we did not see any drop-off that would suggest the device did not perform to specification.
Comment