Skip Links

Network World

  • Social Web 
  • Email 
  • Close

Q&A: Diligent CTO demystifies data deduplication

By Deni Connor , Network World , 05/04/2007
Newsletter Signup
  • Share/Email
  • Tweet This
  • Comment
  • Print

Diligent Technologies is among the pioneers of data deduplication technology, which helps enterprises reduce redundant copies of data and, in turn, shrink storage requirements and shorten backup times. Neville Yates, Diligent’s CTO, talked with Network World Senior Editor Deni Connor about the varying deduplication technologies used with today’s virtual tape libraries (VTL).

 

So what is deduplication?

Deduplication is a means by which data is examined and compared to existing data. If it is the same, it is filtered out and the existing data is referenced. Deduplication is very prominent in applications such as backup that cause a lot of duplication as a byproduct of how they work. These applications are prime targets for deduplication technology.

 

What forms of deduplication are there?

There are three ways deduplication can occur that are talked about today in the market. One of them is the offering from Diligent called HyperFactor, which takes a look at data in an agnostic form and searches the datastream for similarity. Once similarity is found, a computation difference is performed guaranteeing that what is to be filtered out is exactly the same as what is referenced. Only new data is stored.

Another one uses hash technology or hash algorithms whereby data is sliced into some digestible piece -- such as perhaps 8Kbytes in size -- and a hash is assigned to that data and the data is stored. If that signature or hash is recomputed on a new datastream, then that computation suggests that that data already exists and can be referenced. It doesn't need to consume more storage, thereby reducing the amount of storage consumed.

The third is one where the datastream is looked at inside for its logical content, assuming that a file of a particular name is most likely to be a good candidate when compared to the contents of a file of exactly the same name on a fully qualified basis, meaning directory, directory tree, etc., and then a computational difference is done between the two files.

So there are three fundamental approaches and many different ways of implementing those approaches.

 

What are the different ways deduplication has been implemented?

One of the implementation differences in those approaches is whether you receive all of the data and lay it down on disk and then sometime in the future read it back in from a deduplication perspective, or whether during the receipt of the data you process it inline and in real time to achieve the deduplication.

  • Share/Email
  • Tweet This
  • Comment
  • Print
Partner Content

Explore the Ultrium Edge

The powerful tape technology can address data security with tape encryption as well as long term data protection.

Find Out More

Disk and Tape Square Off

Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization

Download this White Paper

Don't Fall for the Myths

The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.

Review this information

information examination

An examination of information security issues, methods and securing data with LTO-4 tape drive encryption

Read this analysis

Comments (1)
Login
Forgot your account info?

WowBy Anonymous on March 16, 2008, 11:36 pmI guess people just totally forgot Avamar (now EMC). I know for certain that Avamar's "commonality factoring" is hash based, inline deduplication.

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed
Partner Content

Explore the Ultrium Edge

The powerful tape technology can address data security with tape encryption as well as long term data protection.

Find Out More

Disk and Tape Square Off

Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization

Download this White Paper

Don't Fall for the Myths

The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.

Review this information

information examination

An examination of information security issues, methods and securing data with LTO-4 tape drive encryption

Read this analysis