- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
Network World - VMware on Wednesday announced Project Serengeti, open source code that optimizes Hadoop for use in VMware virtualized environments.
Bringing cloud-like benefits to the leading big data analytics tool will make it faster and easier to deploy and manage a variety of Hadoop distributions on VMware machines, company officials say.
"VMware has been working on cloud computing and virtualization for quite some time, and big data is one of the hottest trends in IT. Now, we're bringing those worlds together," says Fausto Ibarra, senior director of product management for VMware. "With these announcements, Hadoop can become a first-class client in IT infrastructures."
USING HADOOP: Could you be a data scientist?
Hadoop, which is an open source software framework for managing massive amounts of unstructured data, is used by some of the top IT shops in the world, such as Yahoo and Facebook, but is still in its early stages of adoption across most mid- to large-size enterprises. Experts say VMware's announcement today, along with other Hadoop-related news this week, further legitimizes the Hadoop market and could spur more companies to begin exploring the potential value of big data analytics.
With Project Serengeti, VMware has optimized Hadoop to run on virtualized infrastructure, compared to physical servers which do not run a hypervisor. While Hadoop clusters are currently running on virtualized machines in some instances now, VMware says supporting Hadoop clusters on its series of market-leading virtualization products will open Hadoop up to be more easily deployed in enterprise settings. Using virtualized servers allows additional virtual machines to be deployed quickly and scale elastically while ensuring high availability and optimal hardware utilization, the company says.
Hadoop support is initially programmed to run on VMware vSphere virtualization products and it is compatible with Hadoop distributions such as those from Cloudera, MapR, IBM and Greenplum. Ibarra says there will be continued advancement of Project Serengeti to extend support to new Hadoop distributions and feature sets.
Making Project Serengeti available free through Apache also continues a trend by VMware to embrace open standards. Its platform-as-as-service (PaaS) offering, Cloud Foundry, for example, is also open source. Ibarra says VMware wants Project Serengeti to be widely adopted within the Hadoop community and compatible with all the various Hadoop distributions, so open source was the way to go.
Project Serengeti is an important move to make Hadoop enterprise-friendly, says Tony Baer, an analyst at Ovum. "This will help Hadoop become more mainstream," he says. There are a variety of use cases where Hadoop could benefit from running in a virtualized environment, such as if an enterprise wants to experiment with a new feature on a dataset, but not expose the entire cluster.
Ibarra says VMware officials have seen three major use cases for Hadoop among customers: One is in companies that are testing the platform and have less than 20 nodes or so. These customers, he says, are ideal for virtualized distributions of Hadoop because it will not require large new capital expenses if Hadoop can run on legacy vShphere private clouds.