The numbers don’t look good
My favorite source of numbers for the tape industry used to be the Santa Clara Consulting Group. They’d been tracking the use of tape in the backup and recovery industry since 2008 and had been a great go-to for such data. They showed a steady decline in number of units shipped, both in terms of drives and media. Unfortunately, it looks like they stopped doing these services in 2014.
Gartner’s most recent report on what media types people are using to do their backups is a pretty solid source of data, though. They’ve got data going back to 2009 that shows the percentage of people that are backing up directly to tape (D2T), backing up to disk then copying to tape (D2D2T), backing up to disk with no tape component (D2D/D2D2D), or backing up to the cloud (D2C/D2D2C). While tape is used in most datacenters in one way or another, the percentage of companies using tape in any way is steadily declining. Companies are clearly moving to D2D or D2C techniques. What are the reasons behind this trend?
Tape is not unreliable – it’s incompatible
Some would say the reason that most people have moved to a D2D2T methodology is that disk is more reliable than tape. I believe the previously mentioned trend shows just the opposite; most people still trust tape as their ultimate line of defense. In fact, some vendors that started out pushing a “no-tapes” message were eventually forced to come up with a method to copy to tape. I do believe this was primarily driven by cost, but even so. Customers don’t push for something they don’t trust -- even if it’s cheaper. D2D2D and D2D2C have been viable options for years now. If customers didn’t trust tape, they would have all moved off it regardless of cost. But believe that it’s still the least expensive way to get the job done – and so they stay.0
I’ve been saying for many years that tape is more reliable than most people give it credit for. The problem with tape is its fundamental incompatibility with how backups are done today. Let me explain. Back when dinosaurs roamed the earth and I was cutting my teeth on backups, it was very common to do daily full backups. If you had network issues, you might scale that back to weekly full backups.
But today’s backup systems perform full backups much less frequently. They might do them once a month, and they might not do them at all. The thinking is that the logistics of tracking all those tapes are much easier than the old days, so we can afford to do full backups less frequently. In addition, this saves money in media and network usage. The result is that 96% or more of today’s backups are incremental backups – and incremental backups are fundamentally incompatible with modern tape drives.
With a few exceptions, modern tape drives are linear style drives. This means that they store data in multiple parallel streams as the tape is pulled past a stationary write head, like the cassette tape drives of old. The tape is being moved extremely fast across these write heads, and it must do so to maintain a good signal-to-noise ratio. Due to this requirement, each drive type has a minimum transfer speed at which it can safely write data to tape. For example, the slowest an LTO-6 drive can write data is 40 MB/s before compression, which means 60-100 MB/s once you add compression. Rarely will an incremental backup generate anywhere near 60-100 MB/s. Instead, they can trickle along at a few MB/s depending on a variety of factors.
What happens is the drive spins up to its minimum speed and transfers the data from the drive’s cache onto tape. If the incoming data rate is slower than the drive’s minimum speed, the buffer will be empty when the drive looks at it again for more data. The drive will need to stop, reposition the write head back to before it stopped writing data, then wait for the buffer to be full again. Once it’s full, the process starts all over again. This process of swiping the tape back and forth across the write head is referred to as shoe-shining, and it prematurely wears out the media, the write head, and the mechanics of the drive. And yes, if you shoe-shine too much, you can make a reliable drive unreliable. This is why I say tape drives are fundamentally incompatible with the way people do backups today.
No problem, say backup vendors! We’ll just interleave/multiplex a bunch of backups together to make the tape drive happy! That might make the backup better, but it will make the restore slower. This is because you need to read all the multiplexed streams and throw away the ones you don’t want.
Disk and cloud also offer so much more
The incompatibility of tape made IT personnel start looking. But the true reasons behind IT’s move away from tape were the things that disk, and cloud made possible. These reasons start with deduplication. Yes, it is what made disk closer to (and sometimes less than) the price of tape. But deduplication isn’t possible without disk, so it’s sort of a chicken and egg thing.
Deduplication also enabled another great feature of disk -- replication. Where tape requires a human to get it to another location, deduplicated backups can easily be replicated to anywhere in the world. Initially this was to another datacenter down the street, but it grew to include replicating to the cloud, or even sending backups directly there. Now we get onsite and offsite backups – without physically moving anything or involving any humans in the process.
Very closely related to deduplication is the idea of having multiple copies of something share the same storage. Think snapshots on a filer, for example. Most of the bits are all the same; only the differences between the snapshots take up extra space. This allows things that simply aren’t possible with tape. Create a test copy and a development of the same data, without taking up much additional space.
Disk-based backups also allow you to run servers or VMs directly from their backup. Doing that has changed everything. It allows for much faster recoveries, and even recoveries in place. The idea that you could a recovery without a restore has changed so much about backup that it’s hard to even think about a time when it wasn’t possible.
Putting all these features together: deduplication, replication, linked copies, and recovery directly from the backup, and you see why the cloud makes so much sense as a perfect companion to all these features. You can also see why those who have made this move – away from tape, towards disk, and eventually to the cloud – find it hard to look back.
Not dead, but dying
Tape is still good at holding onto data for really long periods of time and is still cheaper than most anything else. Such long-term storage needs will ensure that tape will continue to have a role to play for many years to come. But even this last bastion of tape is being challenged by customers who are discovering other uses for older data. Once a company figures out how to monetize that data, they’re going to want it on disk.
As far as backup and recovery goes, the extreme differences in functionality between tape and disk – as well as the shrinking difference in cost – is pushing many people towards disk and the cloud. Disk allows features like deduplication, replication, linked copies, recovery directly from the backup, and recovery of an entire datacenter in the public cloud. The trend started several years ago, and it will continue for some time.