Disaster recovery lessons from an island struck by a hurricane

Prepping for disaster recovery needs a plan, testing, and thinking about non-technical necessities like food and shelter for the recovery team.

A laptop sits on the surface of a stormy sea. [disaster recovery / crisis management]
Fergregory / Getty Images

(A hurricane devastated an island that held two data centers controlling mission-critical systems for an American biotech company. They flew a backup expert with four decades of experience to the island on a corporate jet to save the day. This is the story of the challenges he faced and how he overcame them. He spoke on the condition of anonymity, so we call him Ron, the island Atlantis, his employer Initech, and we don’t name the vendors and service providers involved.)

Initech had two data centers on Atlantis with a combined 400TB of data running on approximately 200 virtual and physical machines. The backup system was based on a leading traditional backup software vendor, and it backed up to a target deduplication disk system. Each data center backed up to its own local deduplication system and then replicated its backups to the disk system in the other data center. This meant that each datacenter had an entire copy of all Initech’s backups on Atlantis, so even if one data center were destroyed the company would still have all its data.

Initech also occasionally copied these backups to tape and stored them on Atlantis for air gap purposes. They could have been stored on the mainland but weren’t, and fortunately the tapes were not destroyed in the disaster but could have been. Initech had considered using the cloud for disaster recovery but found it impractical due to bandwidth limitations on Atlantis.

When the hurricane struck, Initech began looking for someone to spearhead the recovery process on the ground. Due to the level of destruction, they knew they needed someone that could handle command-level recovery. There were only a few people with that skill level at Initech, and one of them was Ron. They put him on a private jet and flew him to Atlantis.

There he found an incredible level of general destruction, and specific to Initech, one data center was flooded, taking out the bottom row of servers in every rack, leaving the servers in upper racks untouched. The recovery plan was to move the servers that were still working to the dry data center and recover everything there.

To continue reading this article register now