Americas

  • United States

Los Alamos builds largest InfiniBand cluster

News
Nov 21, 20023 mins
Networking

Los Alamos National Laboratory (LANL) has turned to server maker RLX Technologies to build what it claimed is the largest Beowulf cluster to date that uses the fledgling InfiniBand interconnect, LANL announced this week.

Los Alamos National Laboratory (LANL) has turned to server maker Promicro Systems to build what it claimed is the largest Beowulf cluster to date that uses the fledgling InfiniBand interconnect, LANL announced this week.

LANL is building a 128-processor cluster comprised of Promicro servers running on Intel’s Xeon chips and the Linux operating system. The government lab has picked InfiniBand for the high speed interconnect between the servers, as it looks to test out the technology for possible use in larger systems. The use of Infiniband marks an effort by LANL to shy away from proprietary interconnects from the likes of Myricom and Quadrics Supercomputers World, said Steve Tenbrink, group leader of the network engineering group at LANL.

“We are trying to push open standards for interconnects where we can,” Tenbrink said. “This will help us evaluate if Infiniband is the right interconnect for a larger cluster.”

LANL made the announcement at the Supercomputing conference being held this week in Baltimore, Maryland.

This new system, equipped with the faster Xeon chips, joins a growing family of Beowulf Linux clusters at the laboratory. LANL used low-power Transmeta processors in an RLX Technologies-based server blade cluster, dubbed Green Destiny, to test ways to lower cooling costs and raise the stability associated with large computers.

While LANL has yet to get the Xeon-based cluster up and running, it plans to use the system to run some of its nuclear simulation software, Tenbrink said. However, it will depend on how well InfiniBand works with a large number of computers, he said.

“You have to start somewhere,” Tenbrink said. “The problem is that as you scale higher and higher, interconnect performance tends to get worse and worse. You really have to be careful how you address that problem.”

InfiniBand, which is backed by many vendors, is just gaining traction in the general marketplace. It provides a high-bandwidth, low-latency interconnect between systems that is useful in high performance computing. Other companies like Myricom and Quadrics make proprietary forms of interconnects that have enjoyed wide adoption, but also pose challenges to the end user, Tenbrink said. Los Alamos used Quadrics in its massive “Q” supercomputer built by Hewlett-Packard.

“Quadrics is a good interconnect, but the problem is that it’s proprietary,” Tenbrink said. “When you have problems, you have to wonder if they will hold things like their source code close to their chest.”

LANL expects to have the new system up and running within a couple of months.

Thanks in part to the low-power Transmeta processors, which generate relatively little heat, LANL’s Transmeta-based cluster resided for a time at least in a hot, dusty warehouse in Los Alamos, New Mexico. The new cluster will join other systems at LANL in a specially cooled server room.