Redundancy and failover and HA, oh my!

Two of the three don't matter.

Core switch
Credit: Core switch by Seeweb (CC BY-SA 2.0)

"The system up-time is calculated to be the mean time to failure divided by the square root of the cosmological constant multiplied by the tangent of the square hypotenuse...oh forget it just give me dual power supplies."

Sound familiar? If you've ever been involved in designing a highly available system, chances are it does. Unfortunately, technology leaders often get caught up in fancy terms and fail to define real requirements.

A worthy goal?

Getting right down to it, here is my thesis: redundancy and failover are not worthy goals in and of themselves, but are merely means unto an end. And that end is availability. In other words, redundancy and failover are mere servants, and their master is availability. To see why this is so, we need to define our terms:

  • Redundancy is having extra components available in the case a component fails
  • Failover is the mechanism, be it automatic or manual, for bringing up a contingent operational plan
  • Availability is a characteristic of a system that describes uptime, typically expressed as a percentage (e.g. 99.99%)

So based on my thesis, it is not a worthy end goal of a system to be redundant or have a failover capability. So what if your system can fail over from Earth to Mars? And just because you have sixteen redundant standby routers waiting for your one active router to crash, who cares?

Now, don't get me wrong; I'm not saying that failover and redundancy are bad things. Far from it. What I am saying is that you shouldn't start off as a basic design premise that you want your network to have the ability to fail over to another location, or that you want your router to be able to take the loss of a power supply because you have a built-in redundant one waiting in the wings. Rather, you should start as your basic design premise that you want your network to be available 99.999% of the time, or that you want your network to be available during a regional disaster, or that you want your network to be available when you have a sudden surge of 5 million customers browsing your website on product release day.

Selecting the proper tools

See the pattern here? Availability is the house you're building, redundancy and failover are the tools you use to build the house. And before you can select the proper tools, you have to first figure out what kind of house you're building. And that means defining requirements. But all too often, no one in leadership is willing to step up to the plate and actually define availability requirements. Why not? Here are three reasons:

  1. Failover and redundancy are easier to define. It's easier for leadership to ask "why didn't it fail over?" versus "are we within our defined SLA?"
  2. Lack of information. Most information you'll find out on the Internet, especially from vendors, will be related to redundancy. The idea of course is they want to sell you that dual power supply for your tertiary something-or-other.
  3. Lack of understanding. Many organizations lack solid technical leadership who really understand the technology under their charge.

Why does it matter? What does it mean for the future of the enterprise?

This matters for at least two main reasons:

  1. Cost. I've observed that more often than not when availability requirements are not properly defined, organizations spend more than they need to. The level of redundancy often exceeds what is really needed.
  2. Complexity. Designing a system with unnecessary components increases complexity, which is bad all around (increased down time, decreased security, etc.).

So at the end of the day, the lesson is this: first define your requirements, then design the system.

This article is published as part of the IDG Contributor Network. Want to Join?

To comment on this article and other Network World content, visit our Facebook page or our Twitter stream.
Related:
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.