Skip Links

Hadoop + GPU: Boost performance of your big data project by 50x-200x?

By Vladimir Starostenkov, senior R&D engineer at Altoros Systems Inc., special to Network World
June 24, 2013 03:13 PM ET

Network World - Hadoop, an open source framework that enables distributed computing, has changed the way we deal with big data. Parallel processing with this set of tools can improve performance several times over. The question is, can we make it work even faster? What about offloading calculations from a CPU to a graphics processing unit (GPU) designed to perform complex 3D and mathematical tasks? In theory, if the process is optimized for parallel computing, a GPU could perform calculations 50-100 times faster than a CPU.

This article, written by the R&D team of Altoros Systems, a big data specialist and platform-as-a-service enabler, explores what is possible and how you can try this for your large-scale system.

[ 2.0 RELEASE: Get ready for a flood of new Hadoop apps ]

The idea itself isn't new. For years scientific projects have tried to combine the Hadoop or MapReduce approach with the capacities of a GPU. Mars seems to be the first successful MapReduce framework for graphics processors. The project achieved a 1.5x-16x increase in performance when analyzing Web data (search/logs) and processing Web documents (including matrix multiplication).

Following Mars' groundwork, other scientific institutions developed similar tools to accelerate their data-intensive systems. Use cases included molecular dynamics, mathematical modeling (e.g., the Monte Carlo method), block-based matrix multiplication, financial analysis, image processing, etc.

On top of that, there is BOINC, a fast-evolving, volunteer-driven middleware system for grid computing. Although it does not use Hadoop, BOINC has already become a foundation for accelerating many scientific projects. For instance, GPUGRID relies on BOINC's GPU and distributed computing to perform molecular simulations that help "to understand the function of proteins in health and disease." Most of other BOINC projects related to medicine, physics, mathematics, biology, etc. could be implemented using Hadoop+GPU, too.

So, the demand for accelerating parallel computing systems with GPUs does exist. Institutions invest in supercomputers with GPUs or develop their own solutions. Hardware vendors, such as Cray, have released machines equipped with GPUs and pre-configured with Hadoop. Amazon has also launched Elastic MapReduce (Amazon EMR), which enables Hadoop on its cloud servers with GPUs.

But does one size fit all? Supercomputers provide the highest possible performance, yet cost millions of dollars. Using Amazon EMR is feasible only in projects that last for several months. For larger scientific projects (two to three years), investing in your own hardware may be more cost-effective. Even if you increase the speed of calculations using GPU within your Hadoop cluster, what about performance bottlenecks related to data transfer? Let's explore this in detail.

How it works

Data processing implies data exchange between HDD, DRAM, CPU and GPU. Figure 1 shows how data is transferred when a commodity machine performs computations with a CPU and a GPU.

image alt text
Figure 1
Data exchange between components of a commodity computer when executing a task
  • Arrow A: Transferring data from an HDD to DRAM (a common initial step for both CPU and GPU computing)
  • Arrow B: Processing data with a CPU (transferring data: DRAM → chipset → CPU)
  • Arrow C: Processing data with a GPU (transferring data: DRAM → chipset → CPU → chipset → GPU → GDRAM → GPU)

As a result, the total amount of time we need to complete any task includes:

  • the amount of time required for a CPU or a GPU to carry out computations
  • plus the amount of time spent on data transfer between all of the components

According to Tom's Hardware (CPU Charts 2012), performance of an average CPU ranges from 15 to 130 GFLOPS. At the same time, performance of Nvidia GPUs, for instance, varies within a range of 100-3,000+ GFLOPS (2012 comparison). All of these measurements are approximate and largely depend on the type of task and the algorithm. Anyway, in some scenarios, a GPU can speed up computations by nearly five to 25 times per node. Some developers claim that if your cluster consists of several nodes, performance can be accelerated by 50x-200x. For example, the creators of the MITHRA project achieved a 254x increase.

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News