• United States

Gushing over Linux

Apr 07, 20036 mins
Enterprise ApplicationsLinux

Petroleum companies rely on cluster computing for oil exploration.

The oil and gas industry was once the province of the world’s fastest supercomputers from makers such as Cray and IBM. But recently, industry heavyweights such as Amerada Hess, British Petroleum, Conoco and Shell discovered that large Linux clusters are capable of tackling the massive computational tasks involved with finding oil.

“Linux clusters are moving in and becoming very competitive in areas where large Unix clusters were used in the past,” says Bill Claybrook, an analyst with Aberdeen Group. That’s because Linux clusters cost between five to 20 times less than proprietary high-performance computing systems that require small fortunes to acquire and maintain.

Download: Linux clustering how-to guide

“You can probably run 80% of the applications used in high-performance computing just as fast on a Linux cluster and at a much cheaper price,” Claybrook says.

Clusters cut costs

Hess migrated from IBM’s supercomputer Unix cluster, or SP system, to clusters of inexpensive Linux PCs over the last five years, as the company became more familiar with Linux and saw the financial benefits of making the switch.

The Houston petroleum company uses a cluster of 320 workstations running Red Hat Linux to process 3-D models of underground geological structures used for locating oil reservoirs. The cluster works by breaking up large amounts of mathematical data and distributing pieces of the problem to the nodes, which are a mix of Dell, HP and IBM machines with dual Pentium IV processors with about a gigabyte of memory each.

Each node works on its own part of the model, then returns data to a “master” Linux cluster node attached to a tape drive. The drive then writes the results to tapes, and Hess geological experts analyze the data to locate oil reservoirs.

Jeff Davis, a systems programmer who manages the Linux cluster, says the change has let Hess acquire more computing power at a fraction of the cost of the IBM SP. The SP cost about $1.5 million per year to maintain and run, whereas the company purchased its first 100-node Linux cluster for around $150,000. Yearly maintenance costs for the cluster run about a quarter the cost of the equipment, Davis adds, noting that clusters now can be added for about $100,000.

“The SP was a first-class machine, but you paid for every bit of it,” Davis says. “For the most part, these are very reliable machines in the Linux cluster.”

SP provided superior uptime — the SP system had been up for two years straight before it was taken down — but Davis says the trade-off was acceptable.

“Most of the problems we do have are not due to Linux,” he says, referring to reliability issues with PC hardware components in the cluster. That was expected, he adds. “What we’re talking about here is going from top-of-the-line server platform to basically desktop machines,” he says.

Disclosing the drawbacks

Aberdeen Group expects Linux clusters to become the dominant platform for high-performance computing in research firms and private industry by next year, as more users of high-end systems replace older supercomputer infrastructure with Linux boxes.

While the price/performance upside to Linux clusters is huge, Claybrook says companies make some sacrifices when switching from a supercomputing platform to Linux.

One of those is speed. While Linux clusters break down problems quickly by distributing workloads, collecting data from many small machines can introduce latency not seen with larger supercomputers, Claybrook says. Also, Linux clusters are not tied together as tightly as a Unix equivalent, where clustering software is close to the operating system.

One company that is working to tighten Linux cluster operation is Linux Networx, which mixes Linux-based Intel clusters and proprietary software to create systems with more of a single-image appearance. Shell International Exploration & Production (Shell E&P) installed a cluster of 112 Linux nodes with the help of Linux Networx.

Since the mid-1980s, Shell E&P used supercomputing platforms from Cray and clustered Unix systems to perform such tasks as geological simulations of underground oil reservoirs.

The firm ran into technical and financial problems with these approaches, says Jim Clippard, a senior research geophysicist who works for Shell E&P in the Netherlands. While powerful, the Cray platform was costly. And the Unix clusters used didn’t have very fast interconnects among machines, which limited the kinds of algorithms the company could run on the clusters.

Shell E&P went with a Linux Networx cluster with Gigabit Ethernet interfaces connecting all 112 nodes, allowing for ample interconnect speeds. Now Shell can scale its processing power beyond what it previously had, because it can add a new Linux-based processor for about one-tenth the cost of adding a new Unix clustered node, Clippard says.

This scaled-up processing power lets the company’s research programmers create new algorithms for modeling geological structures that were not previously possible. Buying the amount of Unix or Cray processing power necessary to run some of Shell’s new programs would have been cost-prohibitive, he adds.

IBM also has been active in helping its petroleum customers migrate to Linux-based clusters, and Linux- and Unix-based hosted grid technology for seismic research computing. Earlier this year, IBM also began a hosted supercomputing service in which research-focused customers can tap into a cluster of Intel- and PowerPC-based servers hosted at an IBM facility in Poughkeepsie, N.Y. Users pay to tap into a grid of more than 100 IBM eServer p655 Unix servers and Intel-based eServer x335 and x345 systems running Red Hat Linux. For oil companies with only periodic needs for supercomputing applications, the rent-a-cluster approach has proved useful.

PGS Data Processing, a petroleum research firm working on seismic imaging in the Gulf of Mexico, now scale real time to handle requests for urgent supercomputing needs as they arise, says John Gillooly, vice president of Western Hemisphere Data Processing for the company. Much of the project work is dedicated to data collection on oil platforms rather than in a computer room. “On-demand supercomputing ideally suits our business requirements for emerging technologies that require short periods of intensive computing,” he says.


Worldwide oil revenue: $400 billion in 2002, according to Newcastle University.

Cost to produce a barrel of oil: $6.33

in 2001, according to research from John C.

Herold Inc.

Oil and gas IT spending: $85 billion in 2002, Gartner reports.

Oil and gas company capital spending: $158 billion for 2001, according to John C. Herold Inc.