- 4chan hell raisers finding fame brings heat?
- The 10 dumbest mistakes network managers make
- NetApp quits bidding war in face of EMC opposition
- CompuServe closes after 30 years
- Google to launch open-source Chrome OS this year
Diligent Technologies is among the pioneers of data deduplication technology, which helps enterprises reduce redundant copies of data and, in turn, shrink storage requirements and shorten backup times. Neville Yates, Diligent’s CTO, talked with Network World Senior Editor Deni Connor about the varying deduplication technologies used with today’s virtual tape libraries (VTL).
So what is deduplication?
Deduplication is a means by which data is examined and compared to existing data. If it is the same, it is filtered out and the existing data is referenced. Deduplication is very prominent in applications such as backup that cause a lot of duplication as a byproduct of how they work. These applications are prime targets for deduplication technology.
What forms of deduplication are there?
There are three ways deduplication can occur that are talked about today in the market. One of them is the offering from Diligent called HyperFactor, which takes a look at data in an agnostic form and searches the datastream for similarity. Once similarity is found, a computation difference is performed guaranteeing that what is to be filtered out is exactly the same as what is referenced. Only new data is stored.
Another one uses hash technology or hash algorithms whereby data is sliced into some digestible piece -- such as perhaps 8Kbytes in size -- and a hash is assigned to that data and the data is stored. If that signature or hash is recomputed on a new datastream, then that computation suggests that that data already exists and can be referenced. It doesn't need to consume more storage, thereby reducing the amount of storage consumed.
The third is one where the datastream is looked at inside for its logical content, assuming that a file of a particular name is most likely to be a good candidate when compared to the contents of a file of exactly the same name on a fully qualified basis, meaning directory, directory tree, etc., and then a computational difference is done between the two files.
So there are three fundamental approaches and many different ways of implementing those approaches.
What are the different ways deduplication has been implemented?
One of the implementation differences in those approaches is whether you receive all of the data and lay it down on disk and then sometime in the future read it back in from a deduplication perspective, or whether during the receipt of the data you process it inline and in real time to achieve the deduplication.

The powerful tape technology can address data security with tape encryption as well as long term data protection.
Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization
The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.
An examination of information security issues, methods and securing data with LTO-4 tape drive encryption
Partner Content
Explore the Ultrium Edge
The powerful tape technology can address data security with tape encryption as well as long term data protection.
Find Out More
Disk and Tape Square Off
Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization
Download this White Paper
Don't Fall for the Myths
The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.
Review this information
information examination
An examination of information security issues, methods and securing data with LTO-4 tape drive encryption
Read this analysis
Comments (1)
WowBy Anonymous on March 16, 2008, 11:36 pmI guess people just totally forgot Avamar (now EMC). I know for certain that Avamar's "commonality factoring" is hash based, inline deduplication.
Reply | Read entire comment
View all comments