Reviews /
An end to downtime
|
|
|||
|
|
Vinca's software proved the most reliable for clustering NT servers, but no one gave us perfect 24-7 operation.
What goes up must come down, but if your servers go down too often, your time as an employee may be up.
All servers fail from time to time, even when outfitted with high-availability hardware: error-correcting memory, multiple CPUs, RAID and redundant power supplies. Clustering software can help PC servers achieve levels of uptime and availability approaching that of minicomputers or mainframes by distributing server workload and replicating data across multiple shared hard disks or servers. Clustering software typically works through network adapters that sense the health of each server, but sometimes the software requires the presence of special hardware. When failure occurs, the clustering software identifies the failed unit and routes the workload to other machines. Ideally, users see no interruption of service. To determine which clustering software adds the most reliability to an NT server environment, we tested products from five vendors. Vinca Corp.'s Co-StandbyServer 4.1 for NT was clearly the best clustering software for making NT Server a reliable and highly available platform. Co-StandbyServer gave us the closest to 24-7 uninterrupted operation of our PC servers and was easy to administer.Failure detection and failover
The key reason you would buy any of these products is reliability, so the ability to detect failures and work around them is crucial to their success. Co-StandbyServer discovered server failures more quickly and transferred control to our secondary servers more seamlessly than the other products we reviewed. It transparently mirrored not only data files, but also registry entries. This capability allowed a secondary server to quickly become an exact replacement for the primary server. In most cases, a slight pause in application responsiveness was the only sign that Co-StandbyServer had failed over to a secondary server. Vinca supplies replication and failover scripts supporting a variety of popular business applications, including Microsoft Exchange, Internet Information Server, SQL Server and Lotus Notes. The scripts tell Co-StandbyServer how to manage an application's data, how to detect whether the application is responding and whether the application dynamically updates NT's registry. Legato's FullTime Cluster 4.3 for NT detected failures within 5 seconds, but sometimes it took as long as 20 seconds to transfer control to a secondary server. As with Co-StandbyServer and Windows Load Balancing Service (WLBS), each FullTime Cluster server is a working peer node, not a hot spare. We were able to configure FullTime Cluster to attempt to restart a failed server before it migrated the workload to the remaining computers. However, FullTime Cluster did not mirror dynamic registry updates, as did Co-StandbyServer. FullTime Cluster's workload balancing performed well, and we were able to easily tune its control of key server resources. FullTime Cluster supports shared network disks and printers in addition to being able to replicate data, a feature unique to the product. Windows NT Server Enterprise Edition 4.0's WLBS feature, which Microsoft recently acquired from Valance, runs as a Windows NT network driver and monitors and distributes network traffic among a cluster of servers. When a server in our cluster failed, WLBS automatically redirected network traffic to the remaining servers. Similarly, bringing the failed server back online caused WLBS to transparently rejoin the server to the cluster. WLBS detected server failures within an acceptable 5 to 10 seconds but did not switch a failed server's active connections to other servers. Clients logged on to the failed server lost their connections. WLBS' biggest drawback is its lack of data replication among servers in a cluster. Because WLBS works at the network driver level and can't replicate data, we found that it worked best in a Web server environment. Apcon's PowerSwitch/NT 4.0 was almost as quick at discovering failures as Vinca's Co-StandbyServer, but its approach of rebooting a secondary server following the failure of a primary server typically delayed application availability for users by a few minutes. Apcon claims the reboot helps restart applications more successfully than if control were simply switched to a different computer, but other clustering software products we tested did not have this problem. A combination of software and SCSI switch hardware, PowerSwitch/NT disconnects a failed server from its disk drives and, just prior to the reboot, links the drives to a secondary server. PowerSwitch/NT doesn't perform load balancing. PowerSwitch/NT is somewhat sensitive about the server hardware it supports. It conflicts with certain video hardware and requires the secondary server to have the same video card, network card and SCSI interface as the primary. Unlike the other products we looked at, Veritas' BackUp Exec 7.2 for NT with the Shared Storage Option and Replication Exec 1.5 takes a data-centric rather than process-centric view of clustering. It monitors and manages the health of shared disks instead of whole servers. Like PowerSwitch/NT, the Veritas software excelled at managing shared disks among several servers, ensuring data preservation in the event of server failure. However, Backup Exec for NT with Shared Storage Option, along with Replication Exec 1.5, doesn't automatically respond to server failures. When we powered off a server in the midst of a series of disk write operations, Veritas' software didn't recover automatically; it required that an administrator restart the application and re-establish the connections after restoring the damaged files. To its credit, Backup Exec made quick work of restoring files.Administration and failback
While fast failover was our primary concern, easy administration is always a plus. We found Co-StandbyServer's management console intuitive and easy to use. The console software displays the status of the resources within the cluster as well as the level of traffic among the clustered machines. We liked the ability to configure the rate at which Co-StandbyServer sent watchdog packets between servers to look for failures. Establishing or reconfiguring a cluster is as simple as dragging and dropping server icons into place. Moreover, rejoining a repaired server to the cluster is completely automatic. Adding specific support for an application such as Microsoft Exchange was painless, involving only the installation on the cluster's servers of a Vinca-supplied script. The user manual is clear, well-organized and to-the-point. We also liked Co-StandbyServer's Service Monitor feature, which automatically detects stopped services. Service Monitor can optionally attempt to restart a stopped service and thus avoid full server failover altogether. Legato's FullTime Cluster's management console provides myriad ways to configure clusters, which Legato refers to as resource groups. Its object-oriented user interface shows a cluster's status and current configuration, and the management console can run on a remote client computer that's not part of the cluster's domain. Returning a repaired server to the cluster is automatic and quick. Like Co-Standby-Server, FullTime Cluster let us tune the rate at which it detected the presence of healthy servers in the cluster. The product documentation is verbose in places but easy-to-follow. Microsoft's simple, unadorned WLBS setup dialog boxes let you designate both server and cluster IP addresses, set priorities and perform other configuration tasks. Each individual WLBS server has a machine name and IP address, and each WLBS cluster has its own separate Internet name and IP address. Client computers connect to the individual servers via the WLBS cluster's Internet name and address. The WLBS documentation consists of online help files that are not as comprehensive as they should be. Apcon's PowerSwitch/NT Administrator console shows the status of each server as well as Apcon's SCSI switch. Configuration is a drag-and-drop affair for both the switch and the servers in the cluster. The software inserts entries in the Windows NT event log and, unlike the other products, can automatically send e-mail, issue SNMP alerts or page a technician when failures occur. It also executes batch file programs you can modify to customize the product's behavior during failover. Apcon's documentation, which includes instructions on installing and rearranging adapter cards in the servers, is geared for a more technical audience than the other products' user guides. Veritas' Backup Exec with Shared Storage Option works with Replication Exec to focus on the preservation of data in a shared disk environment and completely avoid the issues of server failover and failback. Because navigating the list of backed-up files was unintuitive, we found the user interface confusing. Moreover, the product documentation is too general, especially in the area of configuring Replication Exec.Little extra network traffic
You don't want to ensure high availability at the expense of ongoing performance problems. In our tests, PowerSwitch/NT and the Veritas components had the least effect on our application and network performance, producing no noticeable application response time problems or network bandwidth utilization. In fact, PowerSwitch/NT doesn't use the network at all, working instead through SCSI cables connected to the Apcon SCSI switch. For its part, Veritas' Shared Storage Option managed an external cabinet of disk drives via Fibre Channel and did not affect our Fast Ethernet LAN. WLBS used our network frugally but only because it cannot replicate data across our servers. WLBS' only use of the network consisted of watchdog packets and the distribution of client requests among the cluster. Co-StandbyServer and FullTime Cluster added some extra network traffic in heavy file-update conditions, thanks to a large volume of data mirroring activity among the clustered servers. Both were also a bit "chatty" when we monitored them with a Network Associates Sniffer protocol analyzer. Specifically, Co-StandbyServer added less than 2% to network utilization during normal use (no failover in progress and light to medium file update activity), but utilization increased to more than 5% in heavy file-update conditions. FullTime Cluster's utilization was also normally less than 2%, but network utilization went up to about 6% as its clustered servers mirrored heavy file updates across a group of servers. Co-StandbyServer, FullTime Cluster and WLBS can be configured to use a separate Ethernet network interface to monitor the presence of healthy servers. All the vendors recommend using the separate interface, and Vinca supplies a network cable with Co-StandbyServer just for the separate interconnect. Co-StandbyServer and FullTime Cluster also use the separate interface for data mirroring. All these products can dramatically reduce server downtime. They can make NT's foibles - the blue screen of death, its allergic reaction to hardware failures and poor handling of low disk space conditions - a private war between you and the operating system. The best of them, Vinca's Co-StandbyServer, can make server failovers completely transparent to people who depend heavily on server availability. Users may never need to know about operating system and server hardware failures again. RELATED LINKSNance, a computer analyst and consultant for 28 years, is the author of Introduction to Networking, 4th Edition and Client/Server LAN Programming. You can reach him at barryn@ erols.com.
Scorecard and NetResults
How we ranked the products in various categories and vendor contact info.
Legato continues on acquisition spree
Company spends $94 million on Vinca, maker of data mirroring software. Network World, 6/21/99.
NT clustering still bedevils Microsoft
David Strom's view. Network World, 6/21/99.
Start-up will let users gang up Intel PCs
Former RS/6000 designers use best of symmetrical multiprocessing, clustering. Network World, 6/14/99.
NuView aims to ease NT cluster management
New module supports policy management and allows dynamic load balancing across servers. Network World, 6/14/99.
Microsoft touts heavy-duty Win 2000
Network World, 5/31/99.

