Two public cloud service providers have rolled out hosted versions of Hadoop clusters, including SkyTap announcing a partnership with Hadoop distributor Cloudera, and Joyent teaming with HortonWorks on a cloud-based Hadoop service that's available today.
The moves represent a push by vendors to differentiate their products by offering the latest technology, but also the increasing use of the cloud for big data analytics, where virtually unlimited resources are available via a pay-for-only-what-you-need pricing model.
[COMPARE: Office Web Apps vs. Google Docs
Joyent and SkyTap are the newest entrants to the field of cloud-based Hadoop providers, a market that is sure to become increasingly crowded in 2013. SkyTap announced last week that its Cloudera-powered Hadoop offering is available as a three-node cluster, but can be scaled up to 50 nodes without having to pay for a Cloudera license. It does require a SkyTap subscription, which starts at $500 per month. Nodes can be scaled up to eight CPUs with 32GB of RAM.
The SkyTap offering is meant for organizations to prototype Hadoop or use SkyTap's Hadoop cloud offering as an extension of the Hadoop cluster running on their own premises. "If you have an on-premises Cloudera cluster and there's a period of extra capacity, you can scale out to these virtual nodes," says Brett Goodwin, vice president of marketing for SkyTap. Hadoop can generally suffer some performance degradation when running on virtualized hardware, which is why most organizations will run their full-scale, production Hadoop clusters on their own premises. But for testing, or scaling out during peak demand, the cloud is a logical choice, he says.
Joyent meanwhile has also partnered with HortonWorks for its Hadoop offering, which it says runs with bare-metal performance in its cloud. Joyent claims it has the highest-performance hosted Hadoop offering, citing a study by Altoros Systems showing that its Hadoop clusters had nearly a three times faster disk input/output speed compared to similarly-sized infrastructure, while being one-third the price. Like SkyTap's offering, Joyent customers only pay for the IaaS resources, not the Hadoop license.
[CLOUD SHOWDOWN: Amazon vs. Rackspace (OpenStack) vs. Microsoft vs. Google]
Joyent and SkyTap are not alone in offering cloud-based Hadoop products. Amazon Web Services, who many see as a market leader in public infrastructure as a service (Iaas) has Elastic Map Reduce (EMR), which uses a Hadoop framework based on its Elastic Compute Cloud (EC2) and Simple Storage Services (S3) to host a Hadoop framework. AWS has high-storage virtual machine instance types for customers to use EMR with, including one with 48TB of storage and 117GB of RAM across 16 virtual cores. AWS partners with MapR and HBase on its offering.
A variety of other providers are in the early stages of rolling out their cloud-based Hadoop products. Microsoft, for example, has HDInsight in preview, which runs on either Windows or in Microsoft's Azure cloud and can be managed via Windows Systems Center 2012.
Rackspace and HortonWorks announced last fall that the two companies would jointly-develop an OpenStack-powered offering for public and private clouds, but no official products have been announced. Google and MapR have worked together but don't have a product offering, while VMware is working on virtualizing Hadoop clusters with its Project Serengeti work.