Clustering in SQL Server 2008

Last month I attended a session on Failover Clustering at the SQL PASS Summit Conference and it was very enlightening. I have not done too much with SQL Server Failover Clustering except in test mode with Virtual Server and VMWare. This session was packed with experienced professionals who had seen it all. And the Microsoft developers responsible for the code were there leading the session. This is what the SQL PASS Conference is all about. It was standing room only fifteen minutes before the session started but I managed to squeeze into one of the final seats available. It was well worth it.

The session was led by the Lead Developer for SQL Clustering Max Verun. He did a great job handling a series of tough questions. SQL Server Failover Clustering is built upon the Microsoft Server Clustering Services (MSCS) on Windows so since W2K8 Server supports 16 node clustering so does SQL Server 2008. However, as always SQL Server only supports the Active/Passive architecture at the instance level. This means that from a SQL Server instance point of view, only one node can be active while the other node is passively waiting for a failure at which point it becomes active. No downtime (okay, maybe 15 seconds to failover).

Microsoft only charges for one license in this Active/Passive mode since only one node is active at a time. You can have multiple instances of SQL Server spread across multiple nodes to create some semblance of load balancing but each instance is still Active/Passive. The Failover Cluster has a shared volume that is shared between its nodes so in the event of a Failover the new active node can take over straight away. The shared volume these days is usually part of a SAN with built-in RAID to protect against disk failures. The Cluster Resource Group contains this shared volume as well as the IP address and the network name for the cluster which is used by the clients to access the cluster. From the client point of view the cluster is a powerful server that happens to be up 24 by 7. The fact that nodes fail from time to time is transparent to the client because of the instant failover. For multiple nodes the N+I topology is supported with N being the number of active nodes and I being the number of passive nodes. A node could be passive for multiple active instances giving an extra degree of hardware redundancy.

So what’s new in SQL Server 2008? Well, for the first time you can do a “Rolling Upgrade” (spontaneous applause from the crowd). And that counts from SQL Server 2000 or 2005 as well as all new Service Packs and CUs (Cumulative Updates). . In previous versions, you had to bring down the whole cluster for the duration of the SQL Server upgrade. Now because of the new cluster install architecture, you can upgrade one node at a time while it is passive then failover to upgrade the other nodes. There is a small downtime estimated to be 2 minutes which consists of a 15 second failover plus an upgrade script execution.

The new cluster install architecture means that each node is installed separately instead of remotely from one node. Remote execution of installs caused some poor recovery options in the past and prevented the rolling upgrade option. However, it presents a curious anomaly where you can now actually have a cluster that consists of one node. This temporary condition is the first step in terms of installing a cluster but assumes that you add other nodes later on. The single node becomes available immediately while you install the other nodes. Trouble is, you are flying without a parachute. This is called the Integrated Install. There is a more responsible option called the Advanced or Enterprise Install where nodes are prepared without clustering then the cluster is created as the final step. All Add Node, Remove Node and Rolling Upgrade operations are run on each node individually when that node is passive. Again uptime is the key here, so this new architecture minimizes downtime.

All this means that we can get closer to the utopia of the “five nines” uptime. That means servers available 99.999% of the time which translates to just 5 minutes downtime for the whole year. On this New Years Eve, wouldn’t it be great to think of just one Rolling Upgrade and a reboot for 2009? In the old days, we would get a bonus for achieving our Service Level Agreement metrics. Now we just get to keep our jobs…

Happy New Year!

cheers

Brian

Recent Posts:

Microsoft Hands-On Lab Room – a view to the future?

Books Online - still the best...

SQL Data Compression is no Scrooge 

Kilimanjaro or Krakatoa East of Java?

Related:

Copyright © 2008 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022