- Google I/O 2013's Coolest Products and Services
- 10 Star Trek Technologies That are Almost Here
- 19 Generations of Computer Programmers
- 25 Must-Have Technologies for SMBs
CIO - When you hear the expression "big data," the next word you will often hear is "Hadoop." That's because the underlying technology that has made massive amounts of data accessible is based on the open source Apache Hadoop project.
From the outside looking in, you would rightly assume then that Hadoop is big data and vice versa; that without one the other cannot be. But there is a Hadoop competitor that in many ways is more mature and enterprise-ready: High Performance Computing Cluster.
Like Hadoop, HPCC is open-sourced under the Apache 2.0 license and is free to use. Both likewise leverage commodity hardware and local storage interconnected through IP networks, allowing for parallel data processing and/or querying across the architectures. This is where most of the similarities end, says Flavio Villanustre, vice president of information security and the lead for the HPCC Systems initiative within LexisNexis.
HPCC Older, Wiser Than Hadoop?
HPCC has been in production use for more than 12 years, though the HPCC open source version has been available for only a little more than a year. Hadoop, on the other hand, was originally part of the Nutch project that Google put together to parse and analyze log files and wasn't even its own Apache project until 2006. Since that time, though, it has become the de facto standard for big data projects, far outpacing HPCC's 60 or so enterprise users. Hadoop is also supported by an open source community in the millions and an entire ecosystem of start-ups springing up to take advantage of this leadership position.
That said, HPCC is a more mature enterprise-ready package that uses a higher-level programming language called enterprise control language ( ECL) based on C++, as opposed to Hadoop's Java. This, says Villanustre, gives HPCC advantages in terms of ease of use as well as backup and recovery of production. Speed is enhanced in HPCC because C++ runs natively on top of the operating system, while Java requires a Java virtual machine (JVM) to execute.
HPCC also possesses more mission-critical functionality, says Boris Evelson, vice president and principal analyst for Application Development and Delivery at Forrester Research. Because it's been in use for much longer, HPCC has layers-security, recovery, audit and compliance, for example-that Hadoop lacks. Lose data during a search and it's not gone forever, Evelson says. It can be recovered like a traditional data warehouse such as Teradata.
How-to: Secure Big Data in Hadoop
Rags Srinivasan, senior manager for big data products at Symantec, wrote about this shortcoming in a May 2012 blog post on issues with enterprise Hadoop: "No reliable backup solution for Hadoop cluster exists. Hadoops way of storing three copies of data is not the same as backup. It does not provide archiving or point in time recovery."