During the past few years Neustar, an $830 million publicly-traded data analytics company, has undergone a dramatic business transformation, and it's been powered almost entirely by Hadoop.
The company has a history of providing real-time information to telecommunications and Internet providers - everything from number porting and domain name registries to supplying shorting codes, the basis of text messaging for many mobile providers.
[MORE BIG DATA: 10 real world big data deployments that will change our lives]
In 2011 Neustar had the capacity to be able to track about 60 days worth of historical data, but new executive leadership challenged the company to offer more comprehensive services for customers. "We weren't just going to throw money at the problem," remembers Michael Peterson, Neustar's vice president of platforms and data architecture. The natural choice was to go open source.
Instead of scaling proprietary Oracle and IBM Netezza platforms, instead Peterson and his team turned to Hadoop. Originally Neustar techies worked with Cloudera, which offers a packaged distribution of the open source Apache Hadoop project. But then the developers really got into working in the open source world. "One thing we were trying to get away from are prepackaged vendors with proprietary stuff," he says. Hortonworks, which had just been founded months after Neustar embarked on its Hadoop journey, turned out to be what he calls the "perfect fit."
Hortonworks was born out of Yahoo in 2011 when some of the original engineers who built the search website's distributed architecture platform left to spin out a company to support the open source Hadoop project. Hortonworks stays close to the open source Apache Hadoop code base, and to Yahoo. Each new code set from the Apache project is tested by Hortonworks on Yahoo's massive 40,000-node cluster before it is released as a Hortonworks distribution. And it's garnering some attention in the tech market. Recently Hortonworks has signed on some big name partners, including Microsoft, Rackspace, Teradata and it even joined the OpenStack Foundation. The moves have legitimized not only this company, but the broader open source Hadoop movement, industry watchers say.
For Neustar, Hortonworks turned out to be a good fit. They got prepackaged open source Hadoop code, but because it was true to the trunk, they could iterate on top of it and contribute back to the open source community. Today, Neustar has a 120 node Hadoop cluster managing more than 2 petabytes of data, including the past 18 months worth of data it has collected, not just 60 days it had previously. With the new platform, Neustar now offers customers longer-term data sets, trending visualizations and historical analytics, all powered by Hadoop.
It's not just the business offerings that have transformed at Neustar - the entire IT team's culture has changed to be an open source mindset team, Peterson says. Engineers are experimenting with an OpenStack private cloud deployment now. "The whole process has fit directly into the agile way we want to do things, it's allowed us to take calculated risks and do things quickly in a way where we can see the results," he says.
Hortonworks executives say a new wave of data is fueling the need for new platforms like Hadoop. Web sessions, social media interactions and machine sensors generate massive amounts of data, but they don't fit neatly into traditional relational enterprise databases, hence the rise of NoSQL database platform.
In the past, most of that information handled by these databases fell to the floor and was never picked up. Now, companies like Neustar realize they can actually do something with the data, if they can manage it. Hortonworks Data Platform (HDP) is the name of the company's distribution and it's 100% open source Apache Hadoop code, compiled by Hortonworks and shipped as an enterprise software kit meant to run on top of commodity hardware.
Hortonworks deployments have been mostly focused thus far on supplementing existing data warehousing tools, usually SQL databases, says David McJannet, vice president of marketing for Hortonworks, who just recently joined the company after being at VMware and Microsoft. HDP can be used in conjunction with traditional platforms to manage new unstructured data that organizations usually don't have a good way of managing today. The data can be managed by Hadoop and either feed directly into analytics tools that sit on top of Hadoop, or feed it back into more traditional SQL-style databases that the enterprise may already have.
The data processed by Hadoop can be extremely valuable for companies. Retailers - from hardware and grocery stores - to e-commerce sites, can log significantly more data about each individual visitor to their site, their patterns and history, all in an effort to serve them better. Hadoop thrives in scaling out horizontally to massive sizes without impacting performance.
And now some of the biggest names in technology are buying into the platform too. Hortonworks has been on somewhat of a partnership spree during the past few months. First Teradata and Microsoft announced they would incorporate HDP into their analytics offerings. Then Rackspace announced that Hortonworks would be used to explore a Hadoop-as-a-Service type offering on its OpenStack-powered public cloud. Hortonworks has since joined OpenStack, the open source cloud management platform.
"They approached us," McJannet says about how the Hortonworks-Microsoft partnership began. Microsoft has integrated HDP into its business intelligence products, specifically HD Insights Server. In doing so, Microsoft has begun contributing back to the open source community too. Microsoft was the first to enable Hadoop to run on Windows - it previously only worked on Linux - and a Microsoft engineer chairs the Apache Hadoop project now. Matthew Aslett, an analyst at the 451 Research Group says Microsoft blessing Hadoop, and specifically Hortonworks, could expose the big data platform to the enterprise masses.
[PROGRESS REPORT: Building, and managing, the 21st century data center]
Hortonworks isn't the only Hadoop company making partners though. The company's biggest competitor is Cloudera, which has had quite a string of partnerships of its own during the past year. Oracle, Cisco, IBM, HP, Dell and NetApp are all listed partners of Cloudera. Cisco, for example, has a reference configuration architecture for Cloudera Hadoop deployments, as does IBM. Oracle has a key-value big data appliance.
"Partnerships are a key approach to both companies' strategies," says Aslett, who tracks the database and big data markets. The partnerships are win-win situations, he says. Hortonworks and Cloudera push Hadoop out into major IT vendors who can evangelize the platform for their existing customers, while the big-name vendors ensure they have a play in next-generation database technologies.
Hadoop is a hot topic in IT right now. Last year's inaugural Hadoop Summit, led by Hortonworks, attracted more than 2,300 attendees, Hortonworks says, with significantly more expected this year. Peterson, the Neustar vice president, says engaging with the open source Hadoop community has been an invaluable resource. "Hortonworks is a company that will knit you into that community," he says. "If you're a company that's paying attention to the next generation of engineers and what types of teams they will build, going open source is what you need to do to energize that group." With open source, each individual developer's talents can be leveraged for the greater good. "It's an amazing trend to be a part of," he says.
NOTE: This article has been edited to take out a reference to Cloudera as a proprietary Hadoop distribution company. Cloudera has some proprietary management capabilities that complement Hadoop, but its distribution is still based on open source code.