- Is the Cisco MARS mission going to abort?
- First iPhone worm spreads Rick Astley wallpaper
- 10 stunning 3D buildings made with Google SketchUp
- Open source software ready for big business
- Four reasons to buy (and one reason to avoid) the Droid
De-duplication has become popular for backup data, but not for primary storage. Now, U.S. start-up company Ocarina Networks wants to change that, with a data reduction technology which it claims can shrink live production data too - even if the file formats are already compressed.
The technology has already been picked up by Photoways Group, which runs a British photosharing site, Photobox. It expects to save millions of euro in deferred storage hardware purchases as a result, according to its CTO.
In effect, the Ocarina technology disassembles stored files into their constituent parts in order to compress them, via a out-of-band hardware appliance. The compressed files are then restored when needed via a file system filter driver.
The problem with using current de-dupe schemes on primary storage is that "You're much less likely to find duplicate blocks in an online subdirectory, say," explained Carter George, Ocarina's products VP and co-founder. He pointed out that where there are duplicates, they are often not redundant - on replicated storage arrays, for instance.
However, that doesn't mean there's no redundancy within the files, he added: "For example, a PowerPoint, a PDF, a Word document and a Jpeg all might contain the same picture, but it's re-scaled, or pasted in a different format, or whatever, and while a human would say 'It's the same picture', on disk there's no common bytes."
So in a process the company calls ECO, for extract, correlate, optimize - Ocarina's storage optimizer appliance cracks open the file format and de-duplicates its constituent elements by looking for patterns at the information level, he claimed.
Using this method, even compressed image formats such as Jpeg can be compressed still further, George claimed. That's because a set of photos of the same event will share image elements - and therefore some of their underlying mathematical properties - and those can be de-duplicated.
"The math to do this is really hard," George said. "Most companies concentrate on the D part of R&D. We have seven PhD mathematicians doing breakthrough mathematical research on how to find patterns."
The ECO process is extremely processor-intensive, so the optimizer box is a 16-core Linux appliance. It works out-of-band, pulling files off your NAS system, compressing them and then putting them back in Ocarina format - a size-reduced shadow format, with bit-for-bit consistency checks.
File reconstruction is much faster and is handled by reader software, also Linux-based. You can install it as a filter on a web or application server, or on a workstation, or buy a complete Ocarina Reader appliance.
The reconstruction process adds around 4ms latency, George said, and because you can have multiple readers - Ocarina sells unlimited sites licenses - it shouldn't be a single point of failure.
He added that, as well as selling the technology in appliance form, Ocarina is working with other suppliers to develop integrated tier-2 storage subsystems.
Nevertheless, will the benefit of this kind of compression be enough to overcome users' reluctance to tamper with online data? That may depend on the market sector, suggested Forrester analyst Andrew Reichman.
Partner Content
www.bmc.com
Gartner 2009 Magic Quadrant for Job Scheduling
Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.
Download whitepaper
Dell's SMART Approach to Workload Automation
Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.
Download whitepaper
Workload Automation Cost Savings 2 Minute Video
A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member. See how in this 2-minute video overview.
Go to video
Comment