Skip Links

Network World

  • Social Web 
  • Email 
  • Close

Q&A: Diligent CTO demystifies data deduplication

By Deni Connor , Network World , 05/04/2007

Diligent Technologies is among the pioneers of data deduplication technology, which helps enterprises reduce redundant copies of data and, in turn, shrink storage requirements and shorten backup times. Neville Yates, Diligent’s CTO, talked with Network World Senior Editor Deni Connor about the varying deduplication technologies used with today’s virtual tape libraries (VTL).

 

So what is deduplication?

Deduplication is a means by which data is examined and compared to existing data. If it is the same, it is filtered out and the existing data is referenced. Deduplication is very prominent in applications such as backup that cause a lot of duplication as a byproduct of how they work. These applications are prime targets for deduplication technology.

 

What forms of deduplication are there?

There are three ways deduplication can occur that are talked about today in the market. One of them is the offering from Diligent called HyperFactor, which takes a look at data in an agnostic form and searches the datastream for similarity. Once similarity is found, a computation difference is performed guaranteeing that what is to be filtered out is exactly the same as what is referenced. Only new data is stored.

Another one uses hash technology or hash algorithms whereby data is sliced into some digestible piece -- such as perhaps 8Kbytes in size -- and a hash is assigned to that data and the data is stored. If that signature or hash is recomputed on a new datastream, then that computation suggests that that data already exists and can be referenced. It doesn't need to consume more storage, thereby reducing the amount of storage consumed.

The third is one where the datastream is looked at inside for its logical content, assuming that a file of a particular name is most likely to be a good candidate when compared to the contents of a file of exactly the same name on a fully qualified basis, meaning directory, directory tree, etc., and then a computational difference is done between the two files.

So there are three fundamental approaches and many different ways of implementing those approaches.

 

What are the different ways deduplication has been implemented?

One of the implementation differences in those approaches is whether you receive all of the data and lay it down on disk and then sometime in the future read it back in from a deduplication perspective, or whether during the receipt of the data you process it inline and in real time to achieve the deduplication.

Partner Content

Explore the Ultrium Edge

The powerful tape technology can address data security with tape encryption as well as long term data protection.

Find out more

Disk and Tape Square Off

Discover what disk and tape really cost -- and which solution provides lower total cost of ownership and optimizes energy use for your organization

Download the White Paper

Don't Fall For The Myths

The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.

Download the White Paper

Will You Add Tape Too?

Over two thirds of disk-only users look to add tape back into storage infrastructure according to recent survey.

Download Survey Information

Comments (1)
Login
Forgot your account info?

WowBy Anonymous on March 16, 2008, 11:36 pmI guess people just totally forgot Avamar (now EMC). I know for certain that Avamar's "commonality factoring" is hash based, inline deduplication.

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed
Get instant email notification when white papers, webcasts, executive guides are added to our library. Stay informed and up-to-date with the latest on IT Technologies with Network World's Resource Alerts.

Whitepapers

Advancing the Economics of Networking

Aging network systems and old habits have dictated how businesses spend their IT budgets. As a...

Implementing HA at the Enterprise Data Center Edge to Connect to a Large Number of Branch Offices

This paper reviews the problem of creating a network where the dynamic availability of services is...

Enterprise Data Center Network Reference Architecture

Using a High Performance Network Backbone to Meet the Requirements of the Modern Enterprise Data...

Webcasts

PoE Plus: Impact on the PoE Market

The standard for Power over Ethernet (PoE), IEEE Std. 802.3af(tm)-2003, advanced networking,...

Harnessing the power of communications to increase workplace performance

Due to the convergence of IT and telecommunications technologies, the business workplace has been...

Stay out of the headlines: Detecting and preventing network intrusions

How do YOU stay out of the headlines? There is no denying that risk exists in our computer-driven...

Special Reports

How to lower software costs, complexity

Discover how Software as a Service is the economical alternative to expensive on-site software,...

IT Buyer's Guide To: Data backup and Replication

Learn the latest on Data backup software tools that allow professionals to safekeep their data...

Bringing IT Operations Management to Open Source and Beyond

Learn how to cost effectively and efficiently manage your open source environment in this...