If Microsoft Succeeds In Search, It Will Be Thanks To Open Source

When Microsoft acquired Powerset last year to help them rebuild Microsoft's way into the search engine market, most probably didn't realize Powerset's use of the Hadoop open source search engine software would be in the underpinnings of the "new" Kumo engine. Microsoft has a history of ripping that kind of stuff out of acquisition companies' software, replacing it with Mircosoft's own proprietary technology. I suspect that Microsoft expected to combine Powerset's technology along with assets from the to-be-acquired Yahoo, which of course never happened. But if you follow Yahoo's search engine work, you already know that Yahoo also relies heavily on Hadoop. Yahoo is very public about their use and support of Hadoop. Check out http://developer.yahoo.com/blogs/hadoop, and Yahoo's sponsorship of the Hadoop Summit 2009 conference, if you'd like to see more. The bottom line is it seems that basing Microsoft's Kumo on open source software has been unavoidable, or even, inevitable.shed a little more light on what exactly Powerset's been up to with Hadoop, which is used primarily as the search engine index patterned after Google's MapReduce and GFS file system technologies. But apparently there's more to this Microsoft open source story. A big gap in the needed search engine technology was something akin to Google's very proprietary BigTable storage engine. Enter Powerset, who contributed to the development of Hadoop's HBase (a.k.a. an open source brother of BigTable) beginning back in 2007. (Credit to Matt Asay for finding that link.) The same Registry article reports that two of Powerset's HBase contributors have rejoined the company, reportedly to continue their contributions to HBase.

Earlier this month, The Registry

Simply stated, Microsoft's reliance on open source software for Kumo has been inevitable. Microsoft tried going it alone by developing their own proprietary search technology and failed. Even if Microsoft had acquired some or all of Yahoo, Hadoop and HBase would still have ended up as the underpinnings of Kumo. Again, this train's headed to the open source station, whether or not Microsoft's acknowledged it.

Two fundamental questions remain: 1) Can a second company, after Yahoo, also make a go at the search market using Hadoop open source technology?, and 2) Will Microsoft remain faithful (and thankful) to the Hadoop open source community by continuing to full support its development, as have Yahoo and Powerset?

I'm not a search engine market expert so it's difficult for me to say much about Microsoft's second try at this market, but I suspect any wins will be as much at the expense of Yahoo rather than Google. As far as the later question, it seems to me inevitable that Microsoft will remain on the open source Hadoop path. They've already failed at trying to build their own, and there's not indication (or time) that building another proprietary engine is viable.

My hopes are that the Kumo/Powerset/Hadoop situation serves as an infectious learning experience for Microsoft, showing Redmond how to productively work with an open source project they are so reliant upon. Maybe this is the start of something bigger between Microsoft and the open source community. We can only hope.

See related Microsoft Subnet article.

Like this? Here are some of Mitchell's recent posts. Mitchell's Book Recommendations: Also visit Mitchell's other blogs and podcasts:

Visit Microsoft Subnet for more news, blogs, opinion from around the Web. Sign up for the bi-weekly Microsoft newsletter. (Click on News/Microsoft News Alert.)

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2009 IDG Communications, Inc.