Planning for disaster - Houston-style, Part 1

* A fly on the wall during a disaster recovery operation

Where do you start preparing for a disaster? I was invited a couple of weekends ago to observe a server and storage-area network failover disaster recovery drill at specialty chemical company Champion Technologies in Houston. I jumped at the idea.

After renting a car (gas and a weekend car rental are cheaper than driving your own car), I arrived at Champion at 6:30 p.m. on Friday. System shutdown was already in progress as it had started at 6 p.m.

It seems that Champion was notified by building management that the power would be cut at 6:30 a.m. on Saturday. The company therefore had to do a controlled failover and shift its operations overnight to a disaster recovery site in Scottsdale, Ariz. It was also a chance for Champion to test its disaster recovery plan.

The Champion system consists of 11 Dell PowerEdge 6850 four-core servers running SQL Server, SAP ERP and Microsoft Exchange, connected to 15.5TB of Dot Hill storage with FalconStor Network Storage Server (NSS) appliances that virtualize the storage and replicate the system to the disaster recovery site. The configuration is mirrored with an identical storage and server implementation in Scottsdale.

To facilitate its disaster recovery plan (Compare Data Backup and Replication products), which consists of 364 steps well-documented in Microsoft Project, Champion and FalconStor Software brought in about 15 IT staff, myself, a video crew and a technical support representative from FalconStor Software.

By 7:40 p.m. the SQL Servers were shut down. By 8 p.m., all servers had been shut down, a manual snapshot of the drives had been taken and replication had been forced. Roles were reversed, with Scottsdale taking over primary operations. These operations were followed in Houston by shutting down the virtual tape library, the two NSS servers, the Cisco Fibre Channel switches and the Dot Hill storage.

Right before 8:10 p.m., operations had shifted over to Scottsdale, where operations resumed - all except for a 897GB volume, which had replicated but the UMAP file differed from that of its mirrored volume in Scottsdale. IT staff would take care of that on Sunday morning. Everything was done at 11 p.m. that night, ready to reverse roles early Sunday morning back to Houston when the power would go back on. The previous night, the FalconStor software had re-scanned the volume and rebuilt the UMAP file. Everyone was in bed by 2:30 a.m. Saturday.

At 11:30 a.m. on Sunday I was sitting in a cubicle on the 28th floor in Houston, waiting to regroup with the Champion and FalconStor crew. They were going to bring everything back on line in Houston. Next time we'll reveal how everything went and if they were back in business and ready for the start on the day on Monday.

Learn more about this topic

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10