The Hadoop market is white hot right now. It seems everywhere we turn there is a new company throwing their hat into the Hadoop world. Is all of the attention justified? Is this a case of overhype causing overload? How can the Hadoop world possibly make room for all of these companies to survive and thrive? I guess we are going to see how this all plays out.
Yesterday word came down that Yahoo had spun off some of their top Hadoop engineers into a new open source company backed by Yahoo and Benchmark Capital called Hortonworks. For those wondering, Horton is an elephant in the movie Horton Hears a Who starring Jim Carrey. Of course Hadoop’s logo is little baby elephant. I guess that might be a connection?
Anyway, Yahoo has been the biggest early supporter of Hadoop and has some 40,000 servers processing 5 billion jobs a month. Hadoop is actually an Apache project, so while Yahoo is a major supporter and was the early lead, many major web companies and others have been contributing.
Cloudera is one company that has become the “Hadoop company” supplying support and services around it. Fresh on the heels of the Hortonworks announcement, Cloudera made an announcement of their own. Their latest version will offer more configuration and management tools for Hadoop.
But it doesn’t end there. Platform Computing also yesterday announced “it has signed the Apache Corporate Contributor License Agreement allowing the company to contribute to the Apache Software Foundation for developing Apache-based, open-source Hadoop Distributed File System (HDFS)”. Platform is focusing on its recently announced Platform MapReduce which Hadoop was originally based on back when MapReduce was a strictly Google tool.
Not to be outdone, MapR which “allows more businesses to harness the power of big data analytics. MapR's innovations make Hadoop more reliable, more affordable, more manageable and significantly easier to use,” today announced an expansion of their partner program “ to enable diverse organizations within the Hadoop community expand their reach, ultimately helps customers leverage big data analytics through integrated access to MapR’s next generation distribution for Apache Hadoop”.
Wait there is more! Acuate also yesterday announced support for Hadoop into its BIRT framework for business intelligence. The company says,
“The combination of BIRT’s open source, flexible approach to business intelligence and Hadoop’s data scalability enables organizations to build information applications that give the full range of end users — including business analysts and non-technical users — valuable insight into data stored in Hadoop"
One more company throwing their hat into the Hadoop ring is Pervasive which also announced the release of their “Pervasive TurboRush for Hive, new software that makes Hive queries run faster on less hardware”. In case you were wondering “Hive is the data warehouse system built on top of Hadoop. Pervasive TurboRush for Hive accelerates Hive by using the Pervasive DataRush dataflow engine on the back end, providing faster execution of Hive programs without needing to modify any code”.
Last week I wrote about no less a company than Lexis-Nexis offering their own Hadoop competitor, HPCC as an open source alternative to Hadoop.
So what is the gold rush about? Eric Baldeschwieler, Hortonworks' chief executive and former head of software engineering for the Hadoop team at Yahoo said, “we anticipate that within five years, more than half the world's data will be stored in Apache Hadoop". Well if that is anywhere near true, you can see why the land grab is on.
What a great open source story. One Apache project giving rise to all of this. Now the question will be can they all play together and will the “coopetition” make big data easier, better and faster for all of us.
In the meantime asking the real Hadoop company to stand up, you could wind up with a room full of elephants!