Skip Links

Network World

  • Social Web 
  • Email 
  • Close

The down and dirty on data deduplication

By Logan Harbaugh , Network World , 06/04/2007
Newsletter Signup
  • Share/Email
  • Tweet This
  • Comment
  • Print

At its core, data deduplication is a simple concept. Stored data is parsed for duplicate sequences, and when duplicates are found, a pointer to the first instance is inserted in place of the duplicated data.

For example, using a product that supports data deduplication, a backup of an Exchange server in which 20 recipients have received the same attachment would store only the first instance of that attachment with all others pointing back to it.

Under this scheme, the many parts of different files that are similar need to be stored only once. For instance, if the first few lines of a document contain the path name of the document, that name will be generally the same for all the documents in a folder.

If the path name is 40 characters long, and the first 29 are the same for all of the files, the 29 bytes in all of those files after the first one are replaced with a pointer. Because many types of files have structural elements that are similar from file to file, and PowerPoint or PDF documents may contain the same text as the original Word document, the same strings of text recur in many documents.

  • Share/Email
  • Tweet This
  • Comment
  • Print
Partner Content

Explore the Ultrium Edge

The powerful tape technology can address data security with tape encryption as well as long term data protection.

Find Out More

Disk and Tape Square Off

Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization

Download this White Paper

Don't Fall for the Myths

The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.

Review this information

information examination

An examination of information security issues, methods and securing data with LTO-4 tape drive encryption

Read this analysis

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed