Americas

  • United States

ILM: When a Web site can’t afford to go down

Feature
May 24, 20044 mins
Data CenterSAN

ILM: When a Web site can’t afford to go down.

Lee Abrahamson, practice director of SAN solutions and advanced technology, CNT

Business problem: A shipping company relies heavily on its e-commerce site – so much so that it loses money every second the site is down. A disaster that takes the site down for hours to days would mean thousands – potentially millions – of dollars in lost revenue and perhaps permanent customer attrition.

Traditional approach: Create recovery points in 15-minute intervals on inexpensive but reliable tape and store copies with an off-site disaster-recovery vendor. If a disaster occurs, contact the off-site vendor. However, if this off-site vendor supports too many businesses affected, it might need days to restore systems. Some disaster-recovery sites can handle only a small percentage of their customers simultaneously.

Tape also might prove to be a bottleneck. A busy e-commerce database easily could fill 100 or more “tape mounts” in a 24-hour period (meaning the number of tapes used to back up a daily base copy of the entire database plus bundles of transactions in 15-minute intervals). Restoring many tapes would take hours, perhaps even days. Plus, for tapes stored off-site, the company also must factor in the time – likely another day – to locate and ship the tapes.

New data center approach: Use information life-cycle management (ILM) to put data on the most cost-effective media that also has the performance attributes needed to complete the storage job. Use expensive disk, mid-priced disk, less-expensive disk and tape.

One way to execute ILM is storage virtualization, which inserts storage intelligence between the host and its storage. Most virtualization engines reside “in-band” on the storage network and decouple the storage management functions (mirroring and snapshots) from the storage itself. This lets users build heterogeneous storage environments (multiple tiers and vendors). Such virtualization engines may be appliances but, eventually, they simply will be embedded in a storage network node (like a core switch).

Virtualization presents a logical view to the server. In what I call “logical-land,” certain physical storage limitations (size allocations, expansions) can be removed. Storage functions such as mirroring and snapshots can be applied to any storage type by any vendor. The downside is a single point of failure. Without the engine, servers can’t read the storage, even if they are reconnected directly to it.

Fortunately another option is available: storage-area network (SAN)-based replication of the physical data rather than the logical data. I call this “virtualization lite.” This form of virtualization resides in the data path, but presents the physical disk as-is to the server. It does not require logical re-mapping of the disk. This version sacrifices some features of full virtualization but retains key features such as heterogeneous mirroring and snapshots. And if the engine is removed, servers can operate directly connected to the disk.

So when looking to save that e-commerce site from a time-consuming recovery, the first change is to replace tape with Tier 3 storage (Serial Advanced Technology Attachment) as the primary recovery mechanism. Tape would be used for archiving. Virtualization lite lets us take highly efficient snapshots (base copy plus block-level changes) of our Tier 1 storage (expensive) and put it on Tier 3 storage (inexpensive), and to mix and match vendors between tiers. By retaining snapshots on disk, a local recovery even of a large database is a matter of rolling back to a previous online snapshot, which generally takes minutes – or a few hours for an exceptionally large database. Lastly, the database is archived to tape weekly or so for long-term retention.

One bonus of virtualization lite is more affordable in-house disaster recovery. Most companies already have multiple data centers and network connectivity between them. We can tap the heterogeneous mirroring capabilities of our virtualization lite engine to move data asynchronously over lower bandwidth links to another location. This is less costly than moving physical batches of tape offsite daily. We also minimize costs by using Tier 2 or 3 storage as the replication target.

Once the primary site is ready to come back online, the virtualization lite engine at the remote location can mirror the database back to the primary site, letting the primary servers take control with minimal downtime.