MapReduce sits at the heart of Google's data processing -- and Yahoo's, Facebook's and LinkedIn's as well. But it's been highly controversial, due to an apparent conflict with standard data warehousing common sense. Now two data warehouse DBMS vendors -- Greenplum and Aster Data -- have announced the integration of MapReduce into their SQL database managers. As I explained at length over on DBMS2, I think MapReduce could give a major boost to high-end analytics, specifically to applications in three areas:
- Text tokenization, indexing, and search
- Creation of other kinds of data structures (e.g., graphs)
- Data mining and machine learning
(Data transformation may belong on that list as well.)
All these areas could yield better results if there were better performance, and MapReduce offers the possibiity of major processing speed-ups.