Skip Links

In 2013, deduplication smartens up

By Paul Kruschwitz, director of product management, FalconStor Software, special to Network World
December 24, 2012 11:17 AM ET

Network World - This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

Deduplication, a fresh idea only a few years ago, has become a commodity, with organizations of all sizes deploying deduplication as just another feature in their data protection and backup solutions. This is progress. More data centers can eliminate the redundant data in their backup and storage systems to save money and increase efficiency. However, the job is not done. With deduplication in place, IT leaders can move on to adopting intelligent capabilities to ensure data is properly stored and protected. In 2013 data center managers will push for global deduplication that provides flexibility, scalability, performance and high availability of data.

Simple deduplication capabilities don't inspire much awe these days, but that doesn't mean they aren't a major accomplishment. Less than a decade ago, enterprises were plagued by multiple copies of data in their tape-based systems. There was no cost-effective way to replicate all of those copies off site in a way that protected network bandwidth and the bottom line. Deduplication opened the door to cost-efficient data backup and replication.

CLEAR CHOICE TEST: Recoup with data dedupe

MORE: How to choose between scale-up vs. scale-out architectures for backup and recovery

By 2012 IT teams used various methods to identify duplicate data and retain information about all of the data under storage. That capability has become essential, since data is growing at the rate of 50% to 60% annually, which increases the need for effective data protection and storage solutions.

But is simple deduplication enough today? Most shops view deduplication as a basic feature, but in reality, it is a complicated activity that involves a number of resources and processes and requires the attention of the IT staff to manage that resource. Not all data deduplicates well, so IT still must monitor the data being stored in order to get the best utilization of deduplicated storage. A database that was backing up and replicating efficiently can suddenly fail to deduplicate well because compression or encryption was enabled at the database level.

Intelligent deduplication addresses some of these issues that are now coming to the forefront since organizations have mastered the more straightforward dedupe processes. In the coming year, IT leaders should look for deduplication capabilities that address the reporting and detection of data types. Being able to adapt to these data types, you will need to apply different policy options: inline deduplication, post/concurrent deduplication and not deduplicating.

The first policy, inline deduplication, makes the most sense for small storage configurations or environments with immediate replication needs. This option minimizes storage requirements and can deduplicate and replicate data more quickly. The post-process deduplication option occurs independently and can be scheduled for any point in time, including running concurrently. It can facilitate more efficient transfer to physical tape or more frequent restore activities by postponing deduplication. It allows deduplication solutions to make full use of available processing power while minimizing the impact to the incoming data stream. This process is geared toward multi-node clustered solutions, and it allows for full use of all computing resources. Finally, there are data types that simply do not deduplicate effectively and should not be included in the deduplication policies, including image data, pre-compressed or encrypted data.

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News