Americas

  • United States

Do-it-yourself disaster recovery

Feature
Aug 23, 200411 mins
Backup and RecoveryData CenterSAN

Using virtualization technology to pry apps away from dedicated physical resources can have an added benefit: DR on the cheap.

While most network executives are looking at server virtualization to reduce hardware costs, the technology also could offer a budgetary bonus: less-expensive disaster recovery. With disaster-facility contracts easily costing upward of $30,000 per month, killing off that budget line item is tempting.

“One of the hardest parts budget-wise [in IT] is disaster recovery and its incredible price tag. Traditionally, you had to duplicate everything you’ve got in one data center to another and then pray that you never have to use it,” says Jason Brougham, enterprise network manager for American Medical Response, a Greenwood Village, Colo., ambulance service company with 18,000 employees and 255 locations nationwide. “The only way you can afford to build true disaster recovery is to run hot to hot, with both data centers active all the time on servers using virtualization.”

Companies with virtualized servers and storage-area networks (SAN) in disparate data centers already have most of the pieces in place to take on in-house disaster recovery: They have a potential back-up location in a faraway spot (that likely won’t be affected by the disaster). They have network connections between the two sites. Their virtualization and load-balancing software would let one server or SAN take over for another almost instantly if a short-term failure occurs (from routine maintenance to a few hours of blackout).

Network executives easily can make the common-sense leap for full-fledged in-house disaster recovery. If servers float away in a storm or are otherwise permanently damaged, one data center can become the backup for another. Even if you don’t bring disaster recovery completely in-house, virtualization can help save money on the facility contract. Fewer virtualized servers do the work of more physical servers.

“The pieces of hardware become less critical in a virtualized environment – if there are 400 servers, with virtualization you could conceivably do disaster recovery on 20 servers. That might be reaching, but that’s the idea,” says Vivian Knoerle, principal consultant for Intellinet, a virtualization and disaster-recovery systems integrator in Atlanta. “If you do still use a disaster-recovery facility for hosting, the expense and hardware requirement can be less – because the number of physical servers can be far less.”

Such is the case for insurance company Mutual of Enumclaw, based in Enumclaw, Wash., with 16 offices in Washington, Idaho, Oregon and Utah.

“We approach disaster recovery like a life insurance policy. We don’t want to have too much, but we want enough,” says John Weeks, IT director for Mutual of Enumclaw, which uses virtualization software from VMware, an EMC company. “The virtual capability simplifies our recovery efforts.”

Critical insurance-related processing runs on the mainframe, so Weeks currently contracts with a disaster-recovery facility for the mainframe. But the company relies on Intel-based IBM xSeries servers for other applications such as Citrix, which it runs via a virtualized server farm. With VMware, Mutual of Enumclaw has reduced the number of physical servers it uses by about 35%. (Weeks also has begun rolling out virtualized IBM blade servers for the server farm. A dual-blade box hosts up to three virtual servers while the quad blade hosts as many as five, he says.)

This translates into a lot less hardware required for disaster recovery. Before implementing VMware, the company contracted with its disaster-recovery facility to maintain a similar PC server environment – one back-up server for every production server. “We have simplified that [disaster] model by going to a virtual model,” Weeks says. “VMware is hardware-agnostic, and we can restore systems without identical or near-identical hardware. This creates flexibility and expands our options regardless of what site we recover to, either our own site with older hardware or a new site with all new hardware.” Mutual of Enumclaw also reduced its network and support requirements for disaster recovery, Weeks says.

Still, like all things IT, turning virtualized remote data centers into disaster-recovery backups for one another won’t be a cakewalk. Technology issues abound, with server configuration management/inventory control, data synchronization and WAN bandwidth among the greatest challenges. And you can’t overlook the need to address processes, personnel and practice.

More to lose

Because each virtualized server is the equivalent of many physical servers, if even one of them goes up in flames, so too does much of your IT infrastructure. Rebuilding it quickly means knowing exactly what you’ve lost.

Tools are available that let you take an image or snapshot of an entire virtual server for firing up on another physical machine (as in VMotion or UltraBac). But missing are tools to keep track of exactly how those virtual servers are configured, what software is loaded on each, what tweaks might be needed to ensure all applications perform nicely together and so on.

“Configuration management and change management of virtualized machines is a whole new ball of wax,” Intellinet’s Knoerle warns. “You need to keep very good track of what’s on each server, and the configuration management tools we have today don’t support virtualized machines.”

Plus, for any disaster-recovery operation, “you need to keep track of configuration on an operations-group level,” she says. Virtualization will ease that process – you likely will have your most-critical applications automatically fail over to other virtual servers. But restoring less-critical applications could get ugly. Back-up media labeled by the physical server won’t be good enough. You’ll need to know exactly what virtual machines and applications were running on each physical server, and which processes should be prioritized.

“Do some kind of classification – as simple as maybe putting applications inside labels such as mission-critical, business-critical, operational. That’s how you’ll determine your recovery objectives, and that will determine the infrastructure you need and the plan,” says Stephanie Balaouras, a senior analyst for The Yankee Group.

American Medical Response’s Brougham, who has overseen in-house disaster-recovery efforts for several companies, underscores that an IT inventory assessment of all resources, virtualized and not, is necessary. Most companies do a poor job of inventory management, particularly on servers, he says, because they rarely implement server-level inventory management tools. With a small number of virtualized servers now representing a large number of physical servers, in all likelihood your inventory assessment will uncover that “you’ve got 40 more apps than you really need. Or you’ll find out you need 40 more apps,” he says.

On the bright side, if you haven’t yet standardized on equipment across your data centers, you’re in for some relief. The virtualized servers won’t care what hardware they are placed on, and older equipment can be used. This also differs from the days when data centers had to be exactly the same to perform as back-up sites.

SANs and synchronization

You will need to analyze the data on your SAN in a similar fashion. Mutual of Enumclaw plans to expand its SAN but will continue using existing storage for testing and disaster recovery, Weeks says. It will add 3T-byte EMC AX100 Serial Advanced Technology Attachment SAN devices with built-in switches. These switches, also available in stand-alone versions from vendors such as Brocade and McData, let the SAN move data from one device to another for disaster recovery, he explains.

Failover should be the easy part. Knowing where your most critical data is, and how to make sure it is the first to come back online, will be the hard part. This is part and parcel with categorizing your data using information life-cycle management techniques, which analysts recommend implementing as part of your in-house disaster-recovery efforts. “The most important step is data classification,” Balaouras says.

You’ll want to look at technologies for synchronizing data between main and failover sites, too. Every disaster-recovery plan uses the “recovery time objective” to keep data loss within acceptable boundaries, says Belinda Wilson, worldwide executive director of business continuity services for HP. This will help you pick your synchronization method. But with virtualization technologies, synchronization can occur at many levels – at the application, the database and the SAN, for instance. Mixing and matching among synchronization techniques and ensuring full data synchronization are issues, as is determining which source is the last word should a bad sync occur.

The fun of requirements

Somewhere near the six-month mark, you should have the building blocks for in-house disaster recovery figured out: configuration/change management, inventory assessment, application and data classification, SAN failover and data synchronization. Now the real fun begins – planning technical requirements for your new virtual data center, disaster-recovery infrastructure.

Your analysis should cover what systems employees use most, what systems the business most relies on and your technical needs, Brougham says. “What’s the load on the network if I suddenly take this database out-of-building? What’s the performance hit on the application if I take it out-of-building? Is it even possible to centralize these systems 250 miles away?”

The third-party part

So with your virtualized servers, your switched SANs and your other new data center technologies, you don’t need to hire a disaster-recovery provider to stand by with space and equipment. Or do you? While you may save big bucks bringing data center business continuity in-house, you still might want to consider off-site specialists for the following needs:
Stand-by space equipped with PCs and telephones where key workgroups can continue operating, such as call center personnel.
Design and testing of failover systems.
Testing and management of your older equipment kept for disaster failovers.
Dry-run training days and results analysis.
Help during the hectic hours when transferring from one site to another — and for failing back when the problem site is ready to go live again.

The answers to these questions will determine your design, from a once-nightly, several-hour-long database-synchronization process to mirrored systems that take snapshots of each other every 15 minutes, for instance.

The do-it-yourself must-ask list

Thanks to virtualization, many companies will be doing their own disaster recovery for the first time. Start with these basic questions:

Click here for more

While you likely already have network connections between your virtualized sites, you’ll have to look at them in a new light. Brougham suggests using Multi-protocol Label Switching (MPLS) for disaster recovery because it offers a lot more capacity than frame relay can be had at T-1 prices yet also can be meshed. MPLS automatically shifts IP traffic among a variety of routes, which is just the kind of failover you’ll want. With any-to-any site connections, you can maintain higher usage thresholds while still letting your links absorb the shared failover traffic. He compares it to a company using two T-1 lines for its data centers, each operating at a 60% utilization rate. Disaster strikes, one data center must failover to the other, and now all the company’s traffic overloads the other link.

But, Brougham warns, “Watch out. You can create your own disaster with a fully meshed network – virus propagation can kill you.” So you’ll have to think through how to increase security when building a meshed WAN for disaster recovery.

The people factor

Like all IT projects, half the battle is won by technology, the other by process and people. Draw up detailed procedures and practice them – you don’t want the live disaster to be the first time that your staff implements the plan. They might need a new mindset when bringing up a back-up data center via server virtualization, too. They might be accustomed to replicate applications, not the fast failover in a tightly coupled, virtualized environment. Unexpected issues might arise, such as deciding when they must reconfigure the DNS server to point to the back-up data center.

“From a pure disaster-recovery aspect, the first things that go wrong are the ones that you don’t test enough. If you’ve got 120 applications that have to go onto a server in the disaster-recovery site, and you get 119 of them tested, it’s always the one you missed that blows up,” Brougham says. He advises hiring a software-testing consultant to evaluate how applications will play together when ported to the back-up virtualized boxes.

Helpful tools for the do-it-yourself disaster-recovery enterprise

Vendors offer an abundance of products for easing do-it-yourself disaster recover.

Click here for more

Include dry run practices in your testing schedule – which will, of course, require time.

Despite the hard work involved, in-house disaster recovery makes a lot of budgetary sense when comparing the rarity of disasters with the cost of maintaining idle back-up equipment. And it almost goes without saying that these days you can’t simply ignore disaster recovery and hope lightening never strikes. Your virtualized data center could be the godsend you never knew you needed.