Deduplication hits the big time

Last week was a big one for NetApp as it acquired data deduplication vendor Data Domain for $1.5 billion. Data deduplication or single-instance storage as it is sometimes called has had a lot of attention in the past year with the acquisition of Diligent by IBM and now Data Domain by NetApp.

NetApp already has its own deduplication capability – called A-SIS (Advanced Single Instance Storage) – which is incorporated into the ONTAP operating system for NetApp's FAS and NearStore filers. A-SIS is a form of post-processing data deduplication, in which data is deduplicated after it is stored on disk. This varies from inline processing deduplication, which Data Domain does where data is deduplicated inflight to being stored on disk.

Pundits cite advantages and disadvantages to each form of deduplication: post-processing requires additional storage capacity for data before it is deduplicated; inline deduplication claims a performance tax because deduplication happens as it is being backed up.

NetApp's A-SIS is focused on deduplicating primary, secondary and tertiary storage on Unix or Windows data volumes. It can be scheduled to deduplicate data at off-peak times and runs as a background process.

NetApp estimates that full system backups can be deduplicated by a factor of 20:1 over time, that databases can be deduped by 30% to 50% and that e-mail archives will see a 20% reduction in the amount of data.

Other post-processing deduplication software is available from Sepaton, Quantum, Sun, FalconStor, EMC and HP.

Next: Inline deduplication Data Domain style

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10