This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
The challenges and promise of big data are front and center for CIOs and other business leaders. The initial applications that have leveraged big data have provided organizations with significant returns and given a glimpse into the power of big data and how it can be used to disrupt their competitive landscape.
These business leaders are beginning to recognize that by better leveraging a variety of data sources -- everything from transactional data to trading data, genomics, smart meter output and sensor data -- they can dramatically change their organization's market position and profitability.
For example, telecommunications companies are looking to exploit big data to target market content to set-top boxes, target rate plans and bundles, improve quality-of-service and offload expensive data warehouses. What's interesting about these examples is that, while big data is being used for new applications with new data, it is also possible to use big data to transform existing applications and use cases.
[ IN DEPTH: How to manage big data overload ]
One of the drivers for big data solutions is the need to harness fast-growing data. In the past, a telecom billing application dealt with one phone per customer and something in the neighborhood of 100 calls a month. Today, telecom billing applications must deal with an explosion of devices, voice, text and data plans.
In his 1990 book, "Microcosm: The Quantum Revolution in Economics and Technology," George Gilder wrote about how technology was changing business, economics and even the very nature of how markets function through the introduction of ever smaller, more powerful yet affordable, microchips and computing systems.
Gilder followed up with the book "Telecosm," which discusses how the telecommunications revolution, including broadband, would connect people, computers and businesses in entirely new ways. Each of these waves predicted how exploiting the increasingly abundant and "free" resource (microprocessors in "Mircocosm" and bandwidth in "Telecosm") would be an engine of change and wealth creation.
We are now on the cusp of a third wave -- a "Datacosm" -- that will enable organizations to exploit abundant and fast-growing data. To be ready to leverage big data, enterprises need the best and latest tools in order to analyze and exploit all of their unstructured and structured data. Hadoop, a framework that enables the analysis of vast amounts of structured and unstructured data on a cluster of commodity servers, has emerged as the most important technology for the Datacosm.
When talking about big data, we are really talking about new architectures -- in essence, a paradigm shift that's needed to process all that information. In the Datacosm, it doesn't make sense to store data separately from the processing. Rather, users of big data need to simplify how they handle and scale data. So what's required is an architecture that can scale linearly and easily. That's really what Hadoop provides.
Hadoop is the software that stitches together these commodity servers into a big data platform. With Hadoop, a single node in the cluster is capable of having 16, 3 terabyte (TB) disks or 48TB of data per node in a cluster. Instead of enterprise storage solutions costing between $10,000 and $125,000 per TB, Hadoop delivers an analytic and storage platform for a couple hundred dollars per TB. If more processing or data capacity is needed, simply add additional nodes to the cluster. The MapR distribution for Hadoop, for example, provides an enterprise grade platform. Some examples of this power include the ability to:
- Create target marketing applications that leverage transaction data and customer interactions with content recommendations to develop significant new revenue opportunities
- Improved accuracy and timeliness of fraud detection, operation analytics and quality management
- Scale operations with at least 10-to-1 cost efficiencies over traditional servers, NAS or SAN alternatives
Any one of these examples could be the basis of a competitive advantage and increased profitability. That's why many organizations have started to deploy Hadoop or are actively investigating its use.
The year 2012 brought important big data advancements and demonstrated success in large enterprises and small organizations, both on-premises and in the cloud. But there is more to come in the future. Hadoop is about to become even more powerful and capable. No longer do enterprises need to limit their Hadoop usage to batch processing. Now they'll be able perform analytics using batch or real-time processing, which will expand use cases and applications.
We're going to see these kinds of real-time capabilities emerge across the whole Datacosm in 2013, as well as the ability to integrate NoSQL processing directly into the Hadoop framework to provide new capabilities for integrated data analysis. All of these advances will help businesses as they work to refine and improve their own uses of their massive stores of customer, transactional and machine-generated data.
The Datacosm has the power to transform the competitive dynamics of entire industries. Gilder has documented two successive waves that brought tremendous change and opportunity. Datacosm represents a third wave.
Are you ready?