Skip Links

Network World

  • Social Web 
  • Email 
  • Close

High availability and Heartbeat

Gearhead By Mark Gibbs , Network World , 06/13/2005
Gibbs

We mentioned Heartbeat a couple of columns ago when we started on Linux Enterprise Clusters, so let's dig deeper.

Heartbeat is a subsystem that allows a primary and a back-up Linux server to determine if the other is "alive" and if the primary isn't, fail over resources to the backup. Heartbeat uses inter-server signaling called "heartbeats" over serial, User Datagram Protocol and PPP/UDP connections, and handles the process of the transfer of the server's IP addresses.

Heartbeat arose from the Heart project in 1999 and is one of the foundational technologies of the High Availability Linux Project.

Now, as simple as failover might sound, we're talking computers and networking and so, of course, it isn't. In fact, the problem is so complex that the current release only supports a pair of nodes. This will change with the forthcoming release of HA Linux Release 2 (HAL-R2) within the next couple of months.

HAL-R2 will be a major revision of the entire Linux system. HAL-R2 will extend Heartbeat's functionality to support multiple nodes, the ability to monitor resources for correct operation, and support for configuration dependencies.

Being able to support multiple nodes in a cluster is crucial, as is monitoring. Resource monitoring ensures that the failure of a service provided by a node can be detected even without the node actually "dying."

Dependencies, otherwise called "constraints" are important, as you might never want database servers to run on the same node as Web servers, or you might want to always have data replication services run only on nodes that are running the database services.

The version of Heartbeat available today is a stable and effective way of ensuring that two nodes in a cluster act in a coordinated manner. Each server runs the Heartbeat daemon and exchange messages called Heartbeats that inform the other machine that the sender is alive.

In the event of the primary node failing, the back-up node Heartbeat is responsible for transferring any IP addresses that must be available after failover.

A highly reliable communications channel is required to avoid the split-brain, or (less sexily) the partitioned, cluster problem. In a split-brain situation both servers are alive and functioning, but both also believe the other is dead because the Heartbeats can no longer be seen. You now have the problem of both servers trying to provide the same services and use the same IP address for crucial client services. Even worse is when both servers share disk resources and compete for access to the same data at the same time.

Partner Content
CA logo

CA Network & Voice Resource Center

Comprehensive Network & Voice Management Visit CA Network & Voice Management Resource Center and get insights into industry best practices, information that helps you to address your challenges.

CA Network & Voice Management Resource Center

whitepaper

Managing Voice Over IP for Successful Convergence

Voice over IP (VoIP) has much to offer in cost savings but some customers have concerns about VoIP call quality compared to the quality of traditional voice services. This white paper will help you learn how to take the right steps so that voice quality is assured.

Managing VoIP for Successful Convergence

whitepaper

The Changing Face of Network Management

Managing your network is serious business. This paper discusses the benefits of integrating configuration change-awareness into your network fault management solution

Download Whitepaper

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to moderator approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed
Save The Date!
What They Are Saying

I am slowly converting my office Firefox. I looked at the owners PC, a PBX techs PC and they had upgraded...- Anonymous

Join the Discussion