- 4chan hell raisers finding fame brings heat?
- The 10 dumbest mistakes network managers make
- NetApp quits bidding war in face of EMC opposition
- CompuServe closes after 30 years
- Google to launch open-source Chrome OS this year
De-duplication has become popular for backup data, but not for primary storage. Now, U.S. start-up company Ocarina Networks wants to change that, with a data reduction technology which it claims can shrink live production data too - even if the file formats are already compressed.
The technology has already been picked up by Photoways Group, which runs a British photosharing site, Photobox. It expects to save millions of euro in deferred storage hardware purchases as a result, according to its CTO.
In effect, the Ocarina technology disassembles stored files into their constituent parts in order to compress them, via a out-of-band hardware appliance. The compressed files are then restored when needed via a file system filter driver.
The problem with using current de-dupe schemes on primary storage is that "You're much less likely to find duplicate blocks in an online subdirectory, say," explained Carter George, Ocarina's products VP and co-founder. He pointed out that where there are duplicates, they are often not redundant - on replicated storage arrays, for instance.
However, that doesn't mean there's no redundancy within the files, he added: "For example, a PowerPoint, a PDF, a Word document and a Jpeg all might contain the same picture, but it's re-scaled, or pasted in a different format, or whatever, and while a human would say 'It's the same picture', on disk there's no common bytes."
So in a process the company calls ECO, for extract, correlate, optimize - Ocarina's storage optimizer appliance cracks open the file format and de-duplicates its constituent elements by looking for patterns at the information level, he claimed.
Using this method, even compressed image formats such as Jpeg can be compressed still further, George claimed. That's because a set of photos of the same event will share image elements - and therefore some of their underlying mathematical properties - and those can be de-duplicated.
"The math to do this is really hard," George said. "Most companies concentrate on the D part of R&D. We have seven PhD mathematicians doing breakthrough mathematical research on how to find patterns."
The ECO process is extremely processor-intensive, so the optimizer box is a 16-core Linux appliance. It works out-of-band, pulling files off your NAS system, compressing them and then putting them back in Ocarina format - a size-reduced shadow format, with bit-for-bit consistency checks.
Partner Content
Explore the Ultrium Edge
The powerful tape technology can address data security with tape encryption as well as long term data protection.
Find Out More
Disk and Tape Square Off
Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization
Download this White Paper
Don't Fall for the Myths
The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.
Review this information
information examination
An examination of information security issues, methods and securing data with LTO-4 tape drive encryption
Read this analysis
Comment