High-Bandwidth Memory (HBM) delivers impressive performance gains

Chip-level design breakthrough provides bandwidth boost for supercomputers and artificial intelligence (AI) applications

rambleed ram memory card hardware hack breach binary by 13threephotography getty
13threephotography / Getty Images

IT vendors typically race to deliver incremental improvements to existing product lines, but occasionally a truly disruptive technology comes along.  One of those disruptive technologies, which is beginning to find its way into enterprise data centers, is High-Bandwidth Memory (HBM).

HBM is significantly faster than incumbent memory chip technologies, uses less power and takes up less space. It is becoming particularly popular for resource-intensive applications such as high-performance computing (HPC) and artificial intelligence (AI).

However, mainstream adoption for running routine business applications is still a ways off because HBM is expensive, could create heat management issues, and might require that certain applications be rewritten.

How does HBM work?

HBM is the creation of US chipmaker AMD and SK Hynix, a South Korean supplier of memory chips. Development began in 2008, and in 2013 the companies turned the spec over to the JEDEC consortium, the standards body for the semiconductor industry. The HBM2 standard was approved in 2016, and HBM3 was officially announced in January. The primary manufacturers of HBM memory chips today are South Korea’s Samsung, SK Hynix and Micron Technology.

HBM was designed to address the lagging performance and power of standard dynamic random-access memory (DRAM), compared to central processing unit (CPU) and graphics processing unit (GPU) performance. The original solution was to throw more DRAM at the problem and populate motherboards with more dual in-line memory module (DIMM) slots, also known as RAM slots.

But the problem was not with the memory itself, but with the bus. The standard DRAM bus is from 4- to 32-bits wide. The HBM bus is 1,024-bits wide; up to 128 times wider, according to Joe Macri, corporate vice president and product CTO at AMD, as well as co-developer of HBM memory. To use an auto analogy, which can handle more cars, a one-lane road or a 16-lane road?

In addition to widening the bus in order to boost bandwidth, HBM technology shrinks down the size of the memory chips and stacks them in an elegant new design form. HBM chips are tiny when compared to graphics double data rate (GDDR) memory, which it was originally designed to replace. 1GB of GDDR memory chips take up 672 square millimeters versus just 35 square millimeters for 1GB of HBM.

Rather than spreading out the transistors, HBM is stacked up to 12 layers high and connected with an interconnect technology called ‘through silicon via’ (TSV). The TSV runs through the layers of HBM chips like an elevator runs through a building, greatly reducing the amount of time data bits need to travel.

With the HBM sitting on the substrate right next to the CPU or GPU, less power is required to move data between CPU/GPU and memory. The CPU and HBM talk directly to each other, eliminating the need for DIMM sticks.

“The whole idea that [we] had was instead of going very narrow and very fast, go very wide and very slow,” Macri said.

Paresh Kharya, senior director of product management for accelerated computing at Nvidia, says standard DRAM is not well suited for HPC use. DDR memory can come close to the performance of HBM memory, but “you’ll have to have a lot of DIMMs, and it’s not going to be optimal” in terms of energy efficiency.

Where is HBM being used?

The first vendor to use HBM for HPC was Fujitsu, with its Arm-based A64FX processor designed for HPC tasks. The Fugaku supercomputer powered by the A64FX debuted at the top of the Top 500 list of supercomputers in 2020 and has remained there since.

Nvidia is using HBM3 on its forthcoming Hopper GPU, while the upcoming Grace CPU uses LPDDR5X technology, a DDR derivative. AMD uses HBM2E on its Instinct MI250X accelerator (based on its GPU technology) and Intel plans to use HBM on some of the Sapphire Rapids generation of Xeon server processors, as well as the Ponte Vecchio GPU accelerator for the enterprise.

Will HBM be used for mainstream applications?

Technologies have a history of starting at the bleeding edge and working their way into the mainstream. Liquid cooling started out as a fringe concept, mostly used by gamers trying to squeeze as much performance out of the CPU as possible. Now every server vendor offers liquid cooling for their processors, particularly AI processors.

So can HBM memory go mainstream? Macri estimates the price difference between HBM and DDR5 at the same capacity is more than 2 to 1. In other words, 1 GB of HBM costs twice as much as 1 GB of DDR5.  So, he reasons, if you’re going to pay that premium for memory, you’re going to want a return on investment.

“In a TCO equation, performance is in the denominator, all the costs are in the numerator. So if you get double performance, you improve the TCO by double. So performance is what really is the best way to improve TCO,” he said. He adds that for simplicity of argument, this assumes costs are flat.

Daniel Newman, principal analyst with Futurum Research, doesn’t expect HBM to go mainstream for two reasons, the first being the cost. “You’ve got a chicken and egg thing there that if it’s costly to build, then it’s not going to be widely used in a broad market. And so that’s going to reduce the volumes that ship,” he said.

The other problem is heat. Now, in addition to a CPU that needs to be cooled, you have five or more memory chips that share the same cooler. “That means that the processor is dissipating gobs of power, all in a little tiny package, so you’re going to have a heat problem. Every processor that uses HBM has to have extraordinary heat management,” said Newman.

The bottom line is if you deploy these accelerators for AI and HPC, expect both results and costs in acquisition and operation to match.

Will HBM require that applications be rewritten?

With this new memory paradigm, the question then becomes, do HPC and AI automatically utilize the full extent of HBM memory or is a re-architecture required? It all depends on how you build your applications the first time, say the experts.

“Often application developers would work around the limitations of what the system can offer. So sometimes you’ll have to redesign, or have to update your applications to account for the new capabilities that are available,” said Kharya.

Macri said if an application is memory-bandwidth bound, then it will just go faster with no rewrite needed. If it is memory-latency bound, then it will not go faster other than the intrinsic latency delta between HBM and the memory you are comparing it to. This application would need to be rewritten to remove the dependencies that are causing it to be latency bound. 

Also, he said if the system is loaded down with many applications simultaneously then the HBM system will likely have better performance even if the applications are latency-bound. This is due to the fact that the loaded latency will be lower for HBM.

Kharya agrees that it will depend on how apps were written. If the existing apps worked around various limitations, like memory or latency, then developers will have to “redesign or update their applications to account for the new capabilities that are available, which is usual for when any new computing architecture comes along,” he said.

Does HBM require a shift from CPUs to GPUs?

Another issue is processor architecture. Jim Handy, principal analyst with Objective Analysis, notes that HBM is used with single-instruction, multiple data (SIMD) processors, which are programmed altogether differently than a normal server processor. X86 and Arm are not SIMD, but GPUs are.

“Any program that already ran on a normal processor would have to be reconfigured and recompiled to take advantage of a SIMD architecture. It’s not the HBM that would change things, but the processor type,” he said.

HBM technology continues to advance

The current version of HBM on the market is HBM2E, but in January, JEDEC released the final spec for HBM3. HBM3 runs at lower temperatures than HBM2E at the same level of operating voltage.

HBM3 also doubles the per-pin data rate over HBM2 with data rates of up to 6.4Gb/s. It also doubles the number of  independent channels from eight to 16, and there are other performance enhancements as well.

All of the major memory players—SK Hynix, Samsung, and Micron—are working on HBM3, and products will slowly start coming to market this year, beginning with Nvidia’s Hopper GPU. For now, HBM usage seems to be staying at the high end of performance use cases.

“There are a range of workloads that we’ve designed this CPU [Grace] for and it’s not designed to run Excel and Microsoft Office for example, but to shine in the data-center applications space,” said Kharya.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2022 IDG Communications, Inc.