SC13: Experts debate thorny exascale memory issues

Architectures and base technologies under debate on the road to an exaflop machine.

An expert panel at SC13 Tuesday said that solving the issues posed by modern memory technology will prove key to achieving exascale performance in supercomputing – but there was little agreement on the road map needed.

Moderated by Richard Murphy, a senior memory systems architect at Micron, the panel featured experts from NVIDIA, ARM, IBM, Intel and Notre Dame.

[MORE SC13: Relaxed opening night gala stays cool, sets the tone]

Memory technology has distance to go in order to be ready for an attempt on the exaflop mark – that’s 1,000 petaflops, roughly 30 times faster than the current fastest supercomputer – and some panelists noted that the trends aren’t all positive. Peter Kogge, a professor of computer science at Notre Dame, said that memory density growth per die has slowed precipitously in recent years, dropping from four-fold growth to two-fold growth every three years.

For Intel fellow Shekhar Borkar, the idea of achieving exascale performance will require the use of the right memory technology, and Borkar asserted that only DRAM and NAND are sufficiently mature.

More cutting-edge technologies, like resistive memory and STT-RAM, offer insufficient density and performance, he said.

“The devil,” according to Borkar, “is in the details.”

The ever-increasing complexity of manufacturing memory is proving a further complication for Troy Manning, Micron’s director of advanced memory systems.

“The thing that strikes me, having been in the memory business for over 20 years, is when I started in memory, a fab cost a couple hundred million dollars and we sold our memory parts for two or three bucks,” he said. “Today, a state-of-the-art fab can cost $5 billion and we’re still selling our memory parts for two or three bucks.”

What’s more, according to Manning, customers work to keep their own costs under control, while simultaneously demanding more and better memory.

The panelists were split on the idea of more closely integrating memory into compute hardware. ARM staff research engineer Andreas Hansson made the argument that the most productive way forward is a more holistic approach to system design, integrating memory, interconnect and compute.

“I think the key here is that it’s not about the memory, it’s about the system. So, what really matters is … the overall solution. So there might be some big, fantastic memory solutions, or some really fantastic CPU solutions, but if you don’t design them together … you might wind up with something that is less comparable anyways,” he said.

Kogge said that during a recent collaboration on a Lexis-Nexis project involving Big Data, he’d seen some evidence that closer integration could be the “silver bullet.”

“The really big [performance] gains came about when we started with memory systems … that use stacked memories, particularly hybrid stacked memories where you had two stacks – a DRAM stack and a non-volatile stack – together,” he said.

IBM research staff scientist Doug Joseph concurred, saying that this type of integration is needed to produce more advanced systems, particularly given what he said is a need for a certain degree of automated management.

“We need to allow host controllers to have much more ability to control refresh rates and embed local control in the device or in the stack that can adapt to these variances,” he said.

Others, however, were less enthused about that type of architecture.

“Two years ago, I was completely sold [on the idea]. A year ago, I was cautious. Now, I’m not sure,” said Intel’s Borkar, citing the difficulty of producing a generic method of integration between the two components.

NVIDIA chief scientist Bill Dally had his own ideas about architecture, positing a grid-based array of very small individual memory units on a single large chip, allowing them to communicate as efficiently as possible.

“Moving a bit many millimeters across a DRAM chip hurts [in terms of efficiency],” he said. “Don’t do that – you either make really small DRAM chips, like a millimeter or two on a side, or you make a big DRAM chip … with a bunch of ‘chiclets’ on it, where each chiclet has its own set of pads.”

The arguments seem unlikely to die down anytime soon, but the march toward exascale continues.

Email Jon Gold at jgold@nww.com and follow him on Twitter at @NWWJonGold.

From CSO: 7 security mistakes people make with their mobile device
Join the discussion
Be the first to comment on this article. Our Commenting Policies