One of the cool things about the amazing growth of the Hadoop community over the last few years has been the remarkable community that has evolved where competitors both large and small, old and new have seemed to band together in a "coop-ation". Each competitor contributing to the common goodness of Hadoop and seeking to put their own special sauce on top of it. Now it seems that this solid community wall is starting to show some cracks as the stakes grow higher. Just these past few days we have seen a "shootout at the Hadoop Corral" as Cloudera and Hortonworks have engaged in a "my code is bigger than your code" blog war over who is contributing more to the Apache Hadoop project.

The gunfire started a few weeks ago when Hortonworks co-founder Owen O'Malley wrote a blog post called "The Yahoo Effect". In the post O'Malley wanted to it seems make the point that Yahoo's contributions to Hadoop are without question. They have contributed the most code to the Hadoop project by far. Owen noted that even taking out the contributions by former members of Yahoo who are now Hortonworks employees (OK he put in a plug for his company, but it is his blog), Yahoo is still the largest contributor to Hadoop. Owen also pointed out that in addition to raw code, Yahoo's effect should be measured by people as well. The people who Yahoo employed working on Hadoop who left and are are now working on Hadoop elsewhere (of course at Hortonworks, but elsewhere as well) form the backbone of much of the Hadoop contributions today as well.

To also be fair, O'Malley mentioned that recently other companies like Facebook, LinkedIn and pointedly Cloudera have made significant contributions to Apache Hadoop. That seemed to give a competitor their due, but evidently not enough for the CEO of Cloudera, Mike Olson.

Olson fired off his own blog post in response to Owen's post. In Olson's post he does not dispute Yahoo's early contributions to Hadoop. But Olson makes the point that the sign of a healthy community is the variety of organizations that contribute. In Olson's view that is what separates Hadoop and Linux from single organization contributor projects like MySQL, JBoss and BerkeleyDB. OK, no disagreement from me there. Olson then uses a lot of charts and graphs to show that recently, especially since Hortonworks was founded, Cloudera has contributed much more to Apache Hadoop then anyone. While thanking Yahoo, Hortonworks and everyone else for making Hadoop what it is today, Olson wants to clearly stake Cloudera's claim as the "first among equals" here and the big dog in the Hadoop kennel. Hey as I said about O'Malley's post before, it is his blog and he can write what he wants.

Well this didn't sit well with Hortonworks CEO Eric Baldeschwieler. Eric and many of the Hortonworks team come from Yahoo and really take their stewardship of Apache Hadoop very seriously (that came across loud and clear when I spoke with Eric a week or two ago). He did not like that Olson said O'Malley's story was misleading. Also Eric wanted to point out that not all code contributions are equal. There is a difference between submitting a patch that may make a spelling mistake versus a contribution that contains many lines of code. In Eric's estimation lines of code is the real measuring stick and by that stick Cloudera doesn't hold a candle to Hortonworks or Yahoo. So there you go Olson, my code is bigger than yours.

Now Olson says that anothe thing is that Hortonworks analysis only looks at core Hadoop, not some of the other projects that have sprung up around Hadoop and which Cloudera supports. He has a point there as well.

So whose code is bigger? Does size really count in determining who is the bigger supporter of Apache Hadoop? Is this sort of back and forth healthy and good for the community?

I think it is fair to say that both Cloudera and Hortonworks are supporting Apache Hadoop. They are both doing it to advance their own corporate goals as much as being good community members. They are not the only ones either. There are many companies that are contributing to Hadoop and as both camps says, that is a good thing.

At this point I don't see anything that threatens to upset the good Karma around Hadoop, but make no mistake the stakes here are high and more than one company wants to be "the Hadoop Company"