- FBI warns Hit Man e-mail scammer back
- 20 tech habits to improve your life
- Industry mourns slain Cisco exec
- 10 Firefox add-ons for better browsing
- Wireless LANs face scaling challenges
Newsletters | Podcasts | Chats | Opinions | RSS Feeds | This Week In Print | IT Careers | Community | Reports | Downloads | Slideshows | New Data Center
Partner Sites:App Performance | On Demand Security | Networking Solution | SOA | Value of WDS
Diligent Technologies is among the pioneers of data deduplication technology, which helps enterprises reduce redundant copies of data and, in turn, shrink storage requirements and shorten backup times. Neville Yates, Diligent’s CTO, talked with Network World Senior Editor Deni Connor about the varying deduplication technologies used with today’s virtual tape libraries (VTL).
So what is deduplication?
Deduplication is a means by which data is examined and compared to existing data. If it is the same, it is filtered out and the existing data is referenced. Deduplication is very prominent in applications such as backup that cause a lot of duplication as a byproduct of how they work. These applications are prime targets for deduplication technology.
What forms of deduplication are there?
There are three ways deduplication can occur that are talked about today in the market. One of them is the offering from Diligent called HyperFactor, which takes a look at data in an agnostic form and searches the datastream for similarity. Once similarity is found, a computation difference is performed guaranteeing that what is to be filtered out is exactly the same as what is referenced. Only new data is stored.
Another one uses hash technology or hash algorithms whereby data is sliced into some digestible piece -- such as perhaps 8Kbytes in size -- and a hash is assigned to that data and the data is stored. If that signature or hash is recomputed on a new datastream, then that computation suggests that that data already exists and can be referenced. It doesn't need to consume more storage, thereby reducing the amount of storage consumed.
The third is one where the datastream is looked at inside for its logical content, assuming that a file of a particular name is most likely to be a good candidate when compared to the contents of a file of exactly the same name on a fully qualified basis, meaning directory, directory tree, etc., and then a computational difference is done between the two files.
So there are three fundamental approaches and many different ways of implementing those approaches.
What are the different ways deduplication has been implemented?
One of the implementation differences in those approaches is whether you receive all of the data and lay it down on disk and then sometime in the future read it back in from a deduplication perspective, or whether during the receipt of the data you process it inline and in real time to achieve the deduplication.

Aging network systems and old habits have dictated how businesses spend their IT budgets. As a...
Implementing HA at the Enterprise Data Center Edge to Connect to a Large Number of Branch OfficesThis paper reviews the problem of creating a network where the dynamic availability of services is...
Enterprise Data Center Network Reference ArchitectureUsing a High Performance Network Backbone to Meet the Requirements of the Modern Enterprise Data...

The standard for Power over Ethernet (PoE), IEEE Std. 802.3af(tm)-2003, advanced networking,...
Harnessing the power of communications to increase workplace performanceDue to the convergence of IT and telecommunications technologies, the business workplace has been...
Stay out of the headlines: Detecting and preventing network intrusionsHow do YOU stay out of the headlines? There is no denying that risk exists in our computer-driven...

Discover how Software as a Service is the economical alternative to expensive on-site software,...
IT Buyer's Guide To: Data backup and ReplicationLearn the latest on Data backup software tools that allow professionals to safekeep their data...
Bringing IT Operations Management to Open Source and BeyondLearn how to cost effectively and efficiently manage your open source environment in this...
Partner Content
Explore the Ultrium Edge
The powerful tape technology can address data security with tape encryption as well as long term data protection.
Find out more
Disk and Tape Square Off
Discover what disk and tape really cost -- and which solution provides lower total cost of ownership and optimizes energy use for your organization
Download the White Paper
Don't Fall For The Myths
The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.
Download the White Paper
Will You Add Tape Too?
Over two thirds of disk-only users look to add tape back into storage infrastructure according to recent survey.
Download Survey Information
Comments (1)
WowBy Anonymous on March 16, 2008, 11:36 pmI guess people just totally forgot Avamar (now EMC). I know for certain that Avamar's "commonality factoring" is hash based, inline deduplication.
Reply | Read entire comment
View all comments