- Microsoft Windows chief decries standards grandstanding
- The 5 best, and 5 worst, features of Google Chrome OS
- Federal government using PS3 to crack pedophile passwords
- 10G Ethernet cheat sheet
- Top 10 free Windows tools for IT pros, at a glance
Data deduplication, data reduction, commonality factoring, capacity optimized storage – whatever you call it — is a process designed to make network backups to disk faster and more economical.
The idea is to eliminate large amounts of redundant data that can chew up disk space. Proponents also say it enables you to make more data available online longer in the same amount of disk.
In deduplication, as data is backed up to a disk-based virtual tape library (VTL) appliance, a catalog of the data is built. This catalog or repository indexes individual bits of data in a file or block of information, assigns a metadata reference to it that is used to rebuild the file if it needs to be recovered and stores it on disk. The catalog also is used on subsequent backups to identify which data elements are unique. Nonunique data elements are not backed up; unique ones are committed to disk.
For instance, a 20-slide PowerPoint file is initially backed up. The user then changes a single slide in the files, saves the file and e-mails it to 10 counterparts. When a traditional backup occurs, the entire PowerPoint file and its 10 e-mailed copies are backed up. In deduplication, after the PowerPoint file is modified, only the unique elements of data — the single changed slide – is backed up, requiring significantly less disk capacity.
“The data-reduction numbers are great,” says Randy Kerns, an independent storage analyst. “Most vendors are quoting a 20-to-1 capacity reduction by only storing uniquely changed data.”
Data deduplication uses a couple of methods to identify unique information. Some vendors use a cryptographic algorithm called hashing to tell whether data is unique. The algorithm is applied to the data and compared with previously calculated hashes. Other vendors, such as Diligent, use a pattern-matching and differencing algorithm that identifies duplicate data. Diligent says this method is more efficient, because it is less CPU- and memory-intensive.
Data deduplication software is being deployed either on disk-based backup appliances or VTL boxes that emulate the operations of a tape library. Among the vendors implementing deduplication on devices appliances are Asigra, Avamar, Copan Systems, Data Domain, Diligent, Exagrid and Sepaton. Vendors such as ADIC (since acquired by Quantum), Falconstor and Microsoft provide deduplication software for implementation on other vendors’ industry standard servers or appliances.
Kevin Fiore, vice president and director of enterprise engineering at Thomas Weisel Partners in Boston, has seen the advantages of deduplication.
“We were looking to replace our tape backup environment and get rid of the problems associated with tape,” says Fiore, who uses six Data Domain DD4000 Enterprise Series disk-based backup appliances.
“To get 30 days of backup data online, we were looking at having to buy 60 to 80 terabytes of disk,” Fiore says. “With Data Domain disk-based appliance, the worst we get is a compression ratio of 19-to-1. On one site we get a 39-to-1 compression ratio.”
Partner Content
www.bmc.com
Gartner 2009 Magic Quadrant for Job Scheduling
Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.
Download whitepaper
Dell's SMART Approach to Workload Automation
Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.
Download whitepaper
Workload Automation Cost Savings 2 Minute Video
A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member. See how in this 2-minute video overview.
Go to video
Comments (1)
DeduplicationBy Anonymous on September 19, 2008, 10:44 amThe danger of deduplication is the lack of redundancy. If the single origin backup becomes corrupted for whatever reason, the differentiated compressed backups...
Reply | Read entire comment
View all comments