Storage makers like EMC who have experienced enviable sales growth for more than a decade may soon find growth slowing for an unexpected reason—Riverbed is about to re-purpose an acceleration technology that has served them well to reduce network traffic, to now take a load off storage servers. How will they do this? They are developing a new box that applies dictionary compression (which they call de-duplication) to permanently reduce the amount of storage space used. The new box, aptly named Atlas, will lift a burden from the server. Here's how it works. Riverbed's dictionary compression is, in essence, caching on an arbitrary data segment size. The Riverbed Atlas boxes will sit in front of the server watching bytes go by and determine if a chunk of data referred to as a segment can be tagged. The segments have no relationship to a file or file name. Some, few, or many segments can equal a file, and some, few, or many files can equal a segment. The first time data comes through the Atlas, it detects patterns (segments) in the data, and each segment is tagged with a reference number. The Atlas then saves the segment and reference number in the storage system. It leverages all of the storage features that already exist like RAID redundancy and does not replace the file structure. Files are still found and retrieved by the storage system—they just use fewer storage system resources. Every write and read into the storage system goes through the Atlas box since it knows how to reconstruct the file from the tags and stored file segments. The power of the approach comes from finding many common file segments across multiple files. Riverbed reckons they can lower the cost of future storage growth (file data) by 30 to 90 percent. Given that the cumulative sum of data storage in the world is predicted to grow exponentially in the next three years, there's no need to feel sorry for EMC just yet. But just as installing Steelhead devices forestalls bandwidth upgrades, installing Riverbed's Atlas devices should forestall storage upgrades, many of which would otherwise be lucrative forklift upgrades. EMC should not get complacent.
Advertisement: |
Is this really a good idea?
This is confusing,which storage system are you talking about? The Atlas the server or both?
Does this mean if the Atlas box goes down you no longer have access to the "original data"?
It seems like all local and remote data functions must go through the same Atlas device. If this is correct, then there is a serious single point of failure and possible corruption/loss of data.
IN a WAN environment, you have only one path to the data and there are boxes on each side that you cannot bypass. In a WAN/LAN environment there may be multiple paths to the data and if this is the case then all traffic must pass through the SAME Atlas box.
Atlas Reliability
Once any compression scheme is used then you need that technology to reconstruct the original data. For example any time you use ZIP to compress you had better have the proper software around to go the other way.
The Atlas box sits in front of the storage system (EMC, NetApps, etc). Each Atlas model will be tailored to work with and be properly integrated with the particular storage vendor’s technology. Once data is processed through the Atlas and stored in the EMC system, it will have to go through an Atlas box again on the way out. We understand that Riverbed is aware of the single point of failure risk and will in fact sell each Atlas as a dual-chassis redundant system.
However, the data does not have to pass through the SAME Atlas box. There is nothing stored in the Atlas you buy and use which is unique to your data. All that you really need is the ability to run the Riverbed algorithm. If one fails, then the data will be processed by the redundant chassis. If both of the Atlas boxes were to blow up, you could replace them. We assume you could even route the compressed data to your alternate data center that has an Atlas box for the decompression if necessary.
Virtualization?
Since they live between the storage and the servers, they are technically in band virtualization. This type of product requires a serious amount of work in order to function- interop testing has to be done on all types of OSs and applications. They should have tried to license their engine to an existing in band storage virtualization vendor like Falconstore or IBM's san volume controller. As it is, they're trying to compete with entrenched vendors with mature products with a single trick- deduplication.
Post new comment