Practical advice about disaster recovery planning

Yes, it is possible to contain catastrophes, but with how much pain and at what cost?

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

Disaster recovery plans and the usual mix of uninterrupted power supplies (UPSs), co-location services, data mirroring and hot-standby technologies theoretically make it possible to weather any storm. But are backup systems, replication rules and fast failover solutions enough?

Any data center manager that has implemented a DR solution understands there are always compromises. To save costs, for example, the generators and co-lo facilities are typically designed to support only a subset of the services being provided during times of normal operation. Here are some considerations meant to ensure the compromises are based on the right facts, and that the DR plan stays aligned with the dynamic requirements of the business it protects.

[ BACKGROUND: Hurricane Sandy highlights why disaster recovery planning is important 

CASE STUDY: Disaster recovery trial by fire ... literally ]

* Start with an accurate picture. You need an accurate understanding of the baseline power consumption under normal circumstances so, armed with this data, IT and facilities are in a position to more effectively allocate power during times of crisis.

Technology vendors have met this need for baseline power management with data center solutions that can be queried for temperature and power levels, and a variety of monitoring and control tools. IT managers can take advantage of these innovations in simple or more sophisticated ways. Minimally, they can examine the returned-air temperature at the air-conditioning units and gather data about the power consumption for each rack in the data center.

Alternatively, a holistic energy and cooling management solution provides a more accurate picture by incorporating fine-grained levels of monitoring focused on server inlet temperatures. The best-in-class energy management solutions aggregate the real-time server inlet temperatures along with power consumption throughout the data center.

The results of the holistic energy management solutions can yield immediate and long-term insights. The aggregated thermal and power data can be used to generate thermal and energy maps of the data center, that enable at-a-glance identification of hot spots and the major power users in the data center. Over time, the data can be logged to facilitate trending analysis for DR planning. The holistic approach yields extremely accurate views of the data center based on actual power usage data, which is in contrast to energy management solutions that are based on theoretical models.

[ IN PICTURES: 10 of the world's coolest data centers ]

* Identify and protect high-priority resources. With the ability to view power and temperature patterns in real time and log the data for extended periods of operation, data center managers can identify key resources that merit extra protection and priority during any outage. These can include systems allocated to mission-critical teams of employees or the critical applications that impact high-priority transactions.

On a day-to-day basis, the monitoring puts data center managers in a more proactive stance. Early identification of hot spots, before they reach critical levels, can minimize negative impacts on equipment and user services, and enable preventative measures to be taken. As an added benefit, the visibility of power and temperature can help identify hardware that is consuming too much energy; energy consumption can be improved during nominal operation with refreshes of these systems.

The same solutions that provide visibility can also introduce better control of power. Control of power can avoid outages -- by bringing down temperatures -- and can enable the allocation of power to mission-critical systems during an outage. As part of a DR solution, controlling power is key for avoiding duplication of non-essential systems in a co-lo facility and getting the best use out of the available systems.

A crude method for controlling power is to simply cap the power consumption to the high-priority servers and related CRAH equipment to stay within restricted power levels during any crisis. Since performance is directly related to power levels, a more intelligent energy management solution lets IT dynamically balance power and performance.

The best energy management solutions enable this balanced control with a combination of accurate continuous monitoring of actual power consumption, and the ability to dynamically adjust CPU operating frequencies. The solutions interact with the operating system or hypervisor based on threshold alerts, and ultimately minimize the impacts of power restrictions on applications and end users.

* Better disaster outcomes. Power capping and throttling can maximize the availability of the high-priority business applications and conversely allows IT to temporarily disable or lower performance of non-critical servers during power-conserving mode. Carrying out these controls in response to a natural disaster minimizes the impacts on end users and critical applications.

* Capacity management. The same power management solutions that enable balancing power and performance also maximize outcomes in other ways. By giving data center architects insight about power requirements, these solutions help them calculate and configure rack densities that will stay within the lower power envelopes in effect during times of outages. These adjustments can boost efficiencies during outages and help extend the life of UPSs by up to 25% during power outages, as measured during proof-of-concept testing of power management solutions in data centers.

The biggest payback

The cost of downtime can be a sufficient justification for some companies to invest in the latest holistic energy management solutions. However, it is often the other benefits from these systems that provide the biggest impetus. This is because the best-in-class energy management solutions drive up energy savings every day -- not just during times of disasters.

In fact, it's been observed in real-world cases that intelligent energy management solutions can reduce energy waste by 20% to 40%. It is a conservative assumption, based on observations that approximately 10% to 15% of all data center servers are idle. Since a typical server draws about 400 watts of power, the annual energy cost is $800 or more per server. Reducing this type of waste extends operation during times of restricted power, and yields significant ongoing reductions in operating cost.

There are many motivations for getting better control of data center power, not the least of which is cost control as energy costs continue to skyrocket. Server sprawl has made the data center energy bill one of the fastest increasing component of operational costs. But regardless of the motivation, whether to ensure the survival of a business in a time of a natural disaster or to cap rising energy bills, the benefits are far reaching and add up to a very attractive business case for holistic energy management in today' data centers.

Klaus is the general manager of Data Center Manager (DCM) Solutions at Intel Corporation, where he has managed various groups for over 13 years. Klaus' team is pioneering power and thermal management middleware, which is sold through an ecosystem of data center infrastructure management (DCIM) software companies and OEMs. A graduate of Boston College, Klaus also holds an MBA from Boston University. He can be reached at Jeffrey.S.Klaus@Intel.com.

Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies