Skip Links

Network World

  • Social Web 
  • Email 
  • Close

Deduplication: Stop repeating yourself

New techniques can save disk space and speed backups.
By Deni Connor , Network World , 09/25/2006

Data deduplication, data reduction, commonality factoring, capacity optimized storage – whatever you call it — is a process designed to make network backups to disk faster and more economical.

The idea is to eliminate large amounts of redundant data that can chew up disk space. Proponents also say it enables you to make more data available online longer in the same amount of disk.

In deduplication, as data is backed up to a disk-based virtual tape library (VTL) appliance, a catalog of the data is built. This catalog or repository indexes individual bits of data in a file or block of information, assigns a metadata reference to it that is used to rebuild the file if it needs to be recovered and stores it on disk. The catalog also is used on subsequent backups to identify which data elements are unique. Nonunique data elements are not backed up; unique ones are committed to disk.

For instance, a 20-slide PowerPoint file is initially backed up. The user then changes a single slide in the files, saves the file and e-mails it to 10 counterparts. When a traditional backup occurs, the entire PowerPoint file and its 10 e-mailed copies are backed up. In deduplication, after the PowerPoint file is modified, only the unique elements of data — the single changed slide – is backed up, requiring significantly less disk capacity.

“The data-reduction numbers are great,” says Randy Kerns, an independent storage analyst. “Most vendors are quoting a 20-to-1 capacity reduction by only storing uniquely changed data.”

Data deduplication uses a couple of methods to identify unique information. Some vendors use a cryptographic algorithm called hashing to tell whether data is unique. The algorithm is applied to the data and compared with previously calculated hashes. Other vendors, such as Diligent, use a pattern-matching and differencing algorithm that identifies duplicate data. Diligent says this method is more efficient, because it is less CPU- and memory-intensive.

Data deduplication software is being deployed either on disk-based backup appliances or VTL boxes that emulate the operations of a tape library. Among the vendors implementing deduplication on devices appliances are Asigra, Avamar, Copan Systems, Data Domain, Diligent, Exagrid and Sepaton. Vendors such as ADIC (since acquired by Quantum), Falconstor and Microsoft provide deduplication software for implementation on other vendors’ industry standard servers or appliances.

Partner Content

Explore the Ultrium Edge

The powerful tape technology can address data security with tape encryption as well as long term data protection.

Find out more

Disk and Tape Square Off

Discover what disk and tape really cost -- and which solution provides lower total cost of ownership and optimizes energy use for your organization

Download the White Paper

Don't Fall For The Myths

The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.

Download the White Paper

Will You Add Tape Too?

Over two thirds of disk-only users look to add tape back into storage infrastructure according to recent survey.

Download Survey Information

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed
Get instant email notification when white papers, webcasts, executive guides are added to our library. Stay informed and up-to-date with the latest on IT Technologies with Network World's Resource Alerts.