- Silicon Valley's 19 Coolest Places to Work
- Is Windows 8 Development Worth the Trouble?
- 8 Books Every IT Leader Should Read This Year
- 10 Hot Hadoop Startups to Watch
Network World - This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
Correctly sizing a disk backup with deduplication to meet your current and future needs is an important part of your data protection strategy. If you ask the right questions upfront and analyze each aspect of your environment that impacts backup requirements, you can avoid the consequences of buying an undersize system that quickly exceeds capacity.
First and foremost, it's important to understand that this sizing exercise is different than the process of sizing a primary storage system. In primary storage you can simply say, "I have 8TB to store and so I will buy 10TB." In disk-based backup with deduplication, a sizing exercise must be conducted based on a number of factors. Here's what to consider:
* Data types. The data types you have directly impact the deduplication ratio and therefore the system you need. If your mix of data types is conducive to deduplication and has high deduplication ratios (e.g., 50:1), then the deduplicated data will occupy less storage space and you need a smaller system. If you have a mix of data that does not deduplicate well (i.e., 10:1 or less data reduction), then you will need a much larger system. What matters is what deduplication ratio is achieved in a real-world environment with a real mix of data types.
[ CLEAR CHOICE TEST: Recoup with data dedupe
* Deduplication method. The deduplication method has a significant impact on deduplication ratio. All deduplication approaches are not created equal.
* Retention. The number of weeks of retention you keep impacts deduplication ratio as well. This is because the longer the retention, the more the deduplication system is seeing repetitive data. Therefore, the deduplication ratio increases as the retention increases. Most vendors will say that they get a deduplication ratio of 20:1, but when you do the math, that is typically if the retention period is about 16 weeks. If you keep only two weeks of retention, you may only get about a 4:1 reduction.
Here is an example to highlight this: If you have 10TB of data and you keep four weeks of retention, then without deduplication you would store about 40TB of data. With deduplication, assuming a 2% weekly change rate, you would store about 5.6TB of data, so the deduplication ratio is about 7.1:1 (40TB ÷ 5.6TB = 7.1:1). However, if you have 10TB of data, and you keep 16 weeks of retention, then without deduplication you would store about 160TB of data (10TB x 16 weeks). With deduplication, assuming a 2% weekly change rate, you would store about 8TB of data, which is a deduplication ratio of 20:1 (160TB ÷ 8TB = 20:1).