Backup the data in cloud block storage

If you use a cloud provider's block storage, it warrants backup to avoid possible data loss in the event of an outage.

big data / data center / server racks / storage / binary code / analytics
monsitj / Getty Images

A recent Amazon outage resulted in a small number of customers losing production data stored in their accounts. This, of course, led to typical anti-cloud comments that follows such events.

The reality is that these customers data loss had nothing to do with cloud and everything to do with a common misunderstanding about Infrastructure as a Service (IaaS) resources.

What happened?

Over Labor Day weekend there was a power outage in one of the availability zones in Amazon’s US-East-1 region.  Backup generators came on, but quickly failed for unknown reasons. Although Elastic Block Store (EBS) data is replicated between multiple servers, the outage affected multiple servers.

The bulk of data stored in EBS was fine or was able to be easily recovered after outage; however, .5% of the data was not able to be recovered. Customers among the .5% that did not have a backup of their EBS data actually lost data.

It is often said that you can outsource IT but you cannot outsource the responsibility for IT. If you are going to use another company's service to store your company's important data, you need to understand how that service works. That includes what native protection tools it offers and – even more importantly – what protection it does not offer.

This article discusses the block storage offerings of three major cloud providers, and a separate article will explain cloud-based object storage – services that operate (and are protected) very differently. The purpose, design, and protection capabilities of the two services couldn't be more different.

What is cloud block storage?

All major IaaS vendors provide block storage, which is essentially a very reliable virtual hard drive in the cloud. For example, Amazon has Elastic Block Store, Azure has Managed Disks, and Google has Zonal Persistent Disks.

If you think of block storage in the cloud as nothing more than a very fancy hard drive, what you need to do to protect it will become immediately obvious. The challenge is that many people think that all cloud storage is automatically protected against everything, and that simply isn’t true.

Many cloud block volumes are protected via replication within an availability zone – a specific geographical location. Google is an exception with Zonal Persistent Disks that are replicated within a zone and Regional Persistent Disks that are synchronously replicated across zones

All use block-level replication that is essentially the same as what you would get with a RAID array in a data center. Just like RAID-protected storage, any logical corruption that may occur will likely be replicated as well, causing all data stored on that volume to be corrupted or deleted.

Logical corruption can happen in a number of ways, including human error (accidentally delete a directory), software error (bugs), or an electrical spike. This is why we backup RAID arrays, and this is why you should backup cloud block volumes. The descriptions that AWS, Microsoft and Google provide of their products reinforce this assertion.

  • Amazon’s EBS page says to expect a failure rate between .1% and .2%. They go on to say “if you have 1,000 EBS volumes running for 1 year, you should expect 1 to 2 will have a failure. EBS also supports a snapshot feature, which is a good way to take point-in-time backups of your data.”
  • Microsoft says Azure Managed Disks’ “built-in protection against localized failures might not fully protect the VMs/disks if a major disaster causes large-scale outages. … you should plan for redundancy and have backups to enable recovery.”
  • Google says Persistent Disks “have built-in redundancy to protect your data against equipment failure … Additionally, you can create snapshots of persistent disks to protect against data loss due to user error.

Not backing up these disks goes directly against the recommendations of each vendor.

How do snapshots work in the cloud?

The three vendors provide the ability to create snapshots of a block volumes – image copies of the volume at a particular time. It’s very different from what we mean when we use the term snapshot in storage circles.

The first snapshot or image copy is a full backup, and subsequent snapshots are block-level incremental backups. The image is stored in a different area than the original volume, typically in the object storage system. If you use these disks for VMs that run important applications that are creating data that you would like to keep, any data volumes need to be backed up, and snapshots are an easy, automated way to do that.

Since snapshots are an image-level copy of a volume, you need to make sure you are not changing the volume as you are creating the snapshot. The recommended way is to make sure that any instance using the volume is turned off so that you are not writing data during a backup.

That's not really possible for most people, so the best they can do is run a command inside the VM to temporarily halt writes to the volume while they take a snapshot. Or, if the VM in question is running Windows, it is also possible to integrate with the Windows VSS service so that Windows takes an application-consistent snapshot before you take your volume-level snapshot. If you are not running Windows, pre- and post-scripting is really your only option for ensuring data integrity when taking a snapshot.

One thing about snapshots is that they do have a cost. So make sure that as part of your process of creating snapshots, you are automatically delete snapshots after they pass a particular age. This will help reduce your storage bill.

Each vendor offers tools to help automate the process. Amazon has the Data Lifecyle Manager, Azure has Azure Backup, and Google has the gcloud command-line tool. There are also third-party tools, both free and commercial, that can provide additional functionality. 

Finally, make sure that however you manage snapshots, they are protected against regional failures and hackers. This means using the vendors’ features to send the snapshots to another region and another account. Consider using a single account whose only purpose is to store such snapshots, and make sure to protect that account in the most secure way possible.

Don’t become a cautionary tale

Cloud block storage volumes are a great resource.  But they are not magic, and are not automatically protected against all things that might do them harm. Make sure to take advantage of the backup services your cloud vendors provide, so that if the worst happens, you can easily recover.

Copyright © 2019 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022