Data archiving: It doesn’t have to be on tape

Long-term data storage can be done in the cloud, on disk drives, and optical disks, but each has some drawbacks.

Long-term storage—archiving—requires a very different approach than backup and recovery where throughput and deduplication are the main concerns. Archiving calls for storing data for long periods without becoming corrupted, so when it is retrieved, it is exactly what got stored 10 or 20 years ago.

For most organizations that reach a certain size, standardized linear tape open (LTO) magnetic tape is the best choice. But for those that cannot justify the cost or believe tape is a thing of the past, there are three viable alternatives: object storage in the cloud, on-premises disk storage, and optical media.

Archiving in the cloud

Object storage is specifically designed for long-term storage, as the checksums that are used to identify each object can also be used to verify that its contents haven’t changed.  The system can rerun the checksum and compare it to the previous checksum being used as the unique identifier (UID) of the object. This allows for constant verifying of the integrity of the data, even decades after being stored.

It’s also inexpensive. AWS, Azure, and Google will all store 100TB of data in cold storage for about $100/month, so you could have two copies in two storage providers for about $200/month. Assuming you are only uploading data and never retrieving it, that would be very cost-effective. If you ever do need to retrieve it, it will be expensive, but likely worth it. Note that you will pay two fees: a get fee for each object and per-gigaByte bandwidth charges. If you delete the archive early, you may pay other fees as well because the pricing is based on storing the data for a long time. Make sure you know what you will be paying if and when you actually retrieve the data.

On-premises disk drives

If you’re going to use on-premises disk for archives, there are three choices: standard disk arrays, deduped disk targets, and on-prem object storage.

Standard disk arrays and network attached storage systems tend to be designed for primary data and have a higher cost that reflects that design. Therefore, most environments consider them too expensive for long term-storage.

Deduped disk systems can help reduce that cost for backups, but long-term storage is something very different. Over time you will end up storing data that is all-new to the dedupe system and the dedupe ratio will plummet. Therefore, the cost of a target disk system will end up being more than a standard disk system because you’re paying extra for a dedupe ratio that you’re not actually getting because of the nature of the data.

Both standard and deduped disk systems also have the issue that disk is not very good at holding onto data for longer than five years; after that it will suffers corruption—bit rot. Just like object storage in the cloud, on-premises object storage systems can help by constantly checking the integrity of the data on each system and replacing corrupted data with good data stored in a different disk system. However, disks still tend to be more expensive than simply using cold storage in the cloud. In short, disk of any kind does not tend to be the best choice for archiving.

Optical disk

There are three optical-disk options: standard DVD & Blu-Ray recordable disks, archive-quality DVD-R, and M-disc.

DVD and Blu-Ray discs use an organic die that changes phases when hit with a laser.  Data written to DVD and Blu-Ray should be fine for decades. Archive-quality DVD-R uses dual reflective layers and an extra hard coating to prevent scratching, and they are 10 times the price of DVD-R. M-disc is specifically designed for long-term storage and uses an inorganic layer that is designed to last 100 years. They are expensive, but are about 25% of the cost of archive quality DVD-R per gigabyte. Most modern optical drives can write all three types of media.

DVD discs have a very small capacity—under 5 GB.  Blu-ray discs have a capacity of 25 GB or 50 GB with a double-sided disc. M-disc is available in 25GB, 50GB, and 100 GB. Storing data on these devices is quite slow compared to tape or disk, due to the fact that the process of performing a phase change on a physical medium is slow.

It’s also important to point out that optical media also has a very low uncorrected bit error rate (UBER), some as low as 10-8, most at 10-10. For serial advanced technology attachment (SATA) disk the UBER is 10-14 and for LTO-9 tape is 10-19. This means that optical media doesn’t write data as reliably as SATA disk or tape.

I have talked to media and entertainment companies who use them to make long-term storage copies of movies, in addition to their LTO copies. Their their thought is the Blu-Ray disc will be easier to read in 50 years than an LTO tape. One thing leaning in that direction is that the most modern Blu-Ray device can read the oldest CDs and DVDs, meaning it is much more backward compatible than the typical tape drive, which generally only reads one or two generations back.

Enterprises wanting to use optical as long-term storage should look into optical libraries. They are just like tape libraries, but with optical drives instead of tape drives, and provide a near-limitless supply of long-term storage without too much manual intervention.

Archives store data that you will keep for decades, so take the time to do your research. Know what you’re getting into with each option and make an educated choice.

Copyright © 2022 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022