Test shows that VTL data deduplication works in general, but your mileage will vary
Today's storage mechanisms are all about squeezing as much data into as little space as possible. Virtual tape library (VTL) software contributes to the space-saving cause by using hard-disk storage systems to emulate robotic tape libraries for the purpose of cutting down on using up precious storage real estate.
We used Symantec's BackupExec 11d running on a Windows 2003 server to back up 2TB of data from a 2Gbps Fibre Channel storage system to each device. We backed up the same data a second time, and ran a script that changed 1,361 files totaling just over 60GB worth of changes and ran a third backup process (see How we tested VTLs).
Both appliances got equally high points for providing efficient compression and high data-throughput rates. While their performance indicates they should be able to support multiple simultaneous backup streams, we must caution that your mileage likely will vary depending on the type of data, its inherent compressibility and its susceptibility to deduplication.
Data Domain's DD560 details
The DD560 as tested had 15 500GB drives, for a raw capacity of 7.5TB and a usable capacity of 4.928TB. It featured one 2Gbps Fibre Channel port.
Data Domain provides on-site support as part of the $105,000 price of the system, and this is a good thing. While the initial network configuration of the system is straightforward, accomplished via serial terminal or browser, all configuration of the library, from logon accounts to the type of drives and numbers of cartridges, must be made via the command line. This is not particularly difficult, but requires careful reading of the manual to understand your choices.
The DD560 emulates a StorageTek L180 library with IBM LTO-1 tape drives. As many as 16 virtual drives and 100,000 cartridges can be created. Each tape can hold as much as 800GB. In addition, the DD560 can support as many as eight virtual-library instances: It can appear to be eight separate libraries to eight hosts or media servers, and each virtual library can use logical-unit-number masking to restrict access to one host.
The DD560 is not as flexible as the Quantum box we tested, which emulates a half-dozen different libraries and many tape drives, but the library it does emulate was easy to configure and backup to using BackupExec, as well as Symantec's enterprise-class backup software NetBackup.
The available configurations of cartridges, bar codes on the cartridges and media pools should make it easy for backup administrators to match their existing requirements as long as their software can support the L180 library. Backups from multiple hosts or media servers are easier to complete with the Data Domain system than with the Quantum device.
The DD560 by default sends hardware maintenance reports back to the Data Domain support team via e-mail on a regular basis. It also can send the reports to the local administrator, as well as alerts when environmental or disk space thresholds are exceeded.
The DD560 offers excellent scalability, with the ability to add both additional controllers and expansion shelves to increase capacity and overall throughput.
Quantum's DXi5500 details
The DXi5500 shipped with 24 500GB drives for a raw capacity of 12TB and a usable capacity of 8.17TB. It came with four 2Gbps Fibre Channel interfaces. Configuration of the system is very straightforward, and on-site installation support is provided as part of the $139,000 as-tested price. Its GUI interface makes configuration easier than with the DD560.
Although we had some initial issues getting BackupExec to work with the default library and tape drive types, it was a simple matter to change the library-emulation mode to different library and tape drive types with which BackupExec was better able to communicate. We also had some issues with the GUI interface when running Java 1.6, but reverting to Java 1.5 completely fixed the issue.
Creating the library type, adding drives, cartridges, barcodes and media pools was easy and straightforward through the GUI, with excellent online help available through the interface.
Supporting multiple media servers or backup hosts is somewhat more cumbersome than with the Data Domain system, with only one server per Fibre Channel interface allowed.
Pricing includes Quantum's on-site installation and one-year on-site service package as well as StorageCare Guardian, Quantum's secure remote alerting and diagnostic system. This feature, as does a similar capability in Data Domain's product, lets administrators remotely diagnose and address software configuration issues. We found in our testing that both features worked as advertised.
By the numbers
The space required to accommodate the first backup as defined by our methodology (see How we tested VTLs) was 972GB on the DD560 device and 957GB on the DXi5500 box, indicating approximately 2-to-1 compression on both, which is what you would expect from a standard tape drive with compression enabled, and what both vendors advertise as minimum compression levels.
With the second backup, the space in use was an additional 992GB on the DD560 and 1001GB on the DXi5500. The slight increase from first to the second backup was the result of the deduplication pointers being added to the files as well as other system overhead considerations.
The Data Domain's DD560 does in-line deduplication processing, while the DXi5500 does postprocessing. While the latter doesn't wait until a backup has finished to begin its deduplication process, it does begin processing about 10 minutes into a backup, and continues processing while the backup runs.
The third backup increased the space in use to 1006GB on the DD560 and 1015GB on the DXi5500. The increase on the third backup was substantially less than the 60GB of changed files, because much of the data contained in each file was still the same, and only the changed parts of each file were added to the store.
The total after three full backups of 2TB would have been 6TB if the backups had been made to a simple VTL or actual tape library, while the backups of these VTLs consumed around 1TB because of deduplication. Each full backup of 2TB of actual data added a relatively small amount of data to the space used on each VTL, so it's likely that dozens of full backups of a 2TB volume of a period of months could be supported, resulting in eventual compression ratios of 20-, 30-, 50-to-1 over time.
Backup rates sustained
Backup speeds were limited by server I/O rather than the VTL throughput capacities. Both VTLs sustained similar speeds on each server used in testing. With a single-CPU 2.8GHz server and 2GB RAM, BackupExec reported an average speed of 36.96Mbps for the DXi5500 and 35.93Mbps for the DD560. With a dual-CPU 3.4GHz server with 3GB RAM, speeds were 56.77Mbps for the DXi5500 and 60.35Mbps for the DD560.
Data Domain performed a bit better in our restore test, as the DD560 restored a 24GB folder in 12:40 minutes compared with 13:05 minutes for the DXi5500.
In addition to the 2TB of data, we backed up an Exchange server repeatedly while running Microsoft's LoadSim utility to simulate a large amount of traffic. Unfortunately, there is no way to vary the message size or content with LoadSim, so virtually all the messages were the same.
Even with tens of thousands of identical messages being deduplicated, both VTLs sustained the same backup rates that they did while backing up the data. The intent of this test was to stress the deduplication engine to the greatest extent possible. Because every file needed to be deduplicated to the same pointer, every file meant another database entry in the deduplication database, so the deduplication database was growing as fast as it would ever have to grow, and there were still no hiccups or changes in performance. This indicates that deduplication should not be a bottleneck even with lots of identical files.
For backing up a few terabytes of data, reducing backup windows and enabling quick and easy restores of data over a period of months, either of these products will work well.
We can't make a blanket statement about which of these VTLs is best for your environment. That answer depends on the application mix. For instance, if you have a few file servers for user home directories that don't change rapidly, you could store six months of daily, full backups on a relatively small appliance, achieving relatively enormous effective compression ratios. On the other hand, if you are backing up transactional databases that change rapidly and may be encrypted or other types of data sets that have little in common with each other, such as video, you may not get much compression at all.
Likewise, the utility of a deduplicating VTL as opposed to a standard VTL will depend on the use of the VTL. A disk-to-disk-to-tape environment where the VTL is essentially used as a cache for decreasing the backup window to a tape library, and where jobs are deleted from the VTL once they are backed up to the library, will see little benefit from deduplication. On the other side of the coin, an optimal application for a deduplicating VTL is when you need to maintain weeks or months of backups of the same data.
Harbaugh is a freelance writer and IT consultant in Redding, Calif. He can be reached at firstname.lastname@example.org.
Learn more about this topicGresham expands capabilities of virtual tape library
04/10/07Storage vendors debut new appliances, software
Cloud computing prompts IT organizations to rethink how they acquire talent and develop skills.
Sponsored by AT&T
Microsoft introduces on-premises system designed to sync up with its Azure public cloud computing
Cloud providers, carriers and fast Wi-Fi users are all looking for fatter pipes
Sponsored by Brocade
Sponsored by AT&T
Microsoft, Cisco, Avaya hone their virtual reality strategies.
How techies can bring data mishandling and abuses to light without putting their careers in jeopardy.
A brief history of Ubuntu, as alliterative as all-get-out.
Prototypes and simulations based on virtual reality can save companies millions.