The onslaught of unstructured digital content -- video, audio and images -- is taxing storage systems and creating the need to be able to store multi-petabytes, but current industry practices using RAID and replication to accomplish data protection are expensive at this scale.
Dispersal, a new approach, is cost effective for petabytes of digital content storage. Further, it provides extraordinary data protection, meaning digital assets will not be lost. Executives who make a strategic shift from RAID to dispersal can realize significant cost savings for enterprises with at least 50TB under management.
RAID schemes are based on parity and, at its root, if more than two drives fail simultaneously, data is not recoverable. The statistical likelihood of multiple drive failures has not been an issue in the past. However, as systems grow to hundreds of terabytes and petabytes, the likelihood of multiple drive failure is a reality.
Further, drives aren't perfect, and typical SATA drives have a published bit rate error (BRE) of 1014, meaning once every 100,000,000,000,000 bits, there will be a bit that is unrecoverable. Doesn't seem significant? In today's larger storage systems, it is.
Unfortunately, the likelihood of having one drive fail, and encountering a BRE when rebuilding from the remaining RAID set is highly probable. To put this into perspective, when reading 10 terabytes, the probability of an unreadable bit is likely (at 56%), and when reading 100TB it is nearly certain (at 99.97%).
As a result, enterprises address the data protection shortcomings of RAID by using replication, the technique of making additional copies of data to avoid unrecoverable errors and lost data. However, those copies add additional costs, typically 133% or more additional storage is needed for each additional copy, after including the overhead associated with a typical RAID 6 configuration.
Organizations also use replication to help with failure scenarios, such as a location failure, power outages, bandwidth unavailability and so forth. Having seamless access to data is key to keeping businesses running and profitable.
Executives should realize their storage approach has failed once they are replicating data three times, as it is clear that the replication band-aid is no longer solving the underlying problem associated with using RAID for data protection.
Dispersal can help organizations significantly reduce storage costs, reduce power consumption and the footprint of storage,
as well as streamline IT management processes.
Here's how it works. Information Dispersal Algorithms (IDAs) separate data into unrecognizable slices of information, which
are then distributed -- or dispersed -- by the dispersed storage protocol to disparate storage locations. These locations
can be situated in the same city, the same region, the same country or around the world.
The dispersed storage protocol handles all of the slicing and reconstitution transparently, so users are presented with standard storage interfaces. The time to reconstitute data depends on the speed of the network.