IT vendors typically race to deliver incremental improvements to existing product lines, but occasionally a truly disruptive technology comes along.\u00a0 One of those disruptive technologies, which is beginning to find its way into enterprise\u00a0data centers,\u00a0is High-Bandwidth Memory (HBM).\nHBM is significantly faster than incumbent memory chip technologies, uses less power and takes up less space. It is becoming particularly popular for resource-intensive applications such as\u00a0high-performance computing\u00a0(HPC) and\u00a0artificial intelligence\u00a0(AI).\nHowever, mainstream adoption for running routine business applications is still a ways off because HBM is expensive, could create heat management issues, and might require that certain applications be rewritten.\nHow does HBM work?\nHBM is the creation of US chipmaker AMD and SK Hynix, a South Korean supplier of memory chips. Development began in 2008, and in 2013 the companies turned the spec over to the JEDEC consortium, the standards body for the semiconductor industry. The HBM2 standard was approved in 2016, and HBM3 was officially announced in January. The primary manufacturers of HBM memory chips today are South Korea\u2019s Samsung, SK Hynix and Micron Technology.\nHBM was designed to address the lagging performance and power of standard dynamic random-access memory (DRAM), compared to central processing unit (CPU) and graphics processing unit (GPU) performance. The original solution was to throw more DRAM at the problem and populate motherboards with more\u00a0dual in-line memory module (DIMM) slots, also known as RAM\u00a0slots.\nBut the problem was not with the memory itself, but with the bus. The standard DRAM bus is from 4- to 32-bits wide. The HBM bus is 1,024-bits wide; up to 128 times wider, according to Joe Macri, corporate vice president and product CTO at AMD, as well as co-developer of HBM memory. To use an auto analogy, which can handle more cars, a one-lane road or a 16-lane road?\nIn addition to widening the bus in order to boost bandwidth, HBM technology shrinks down the size of the memory chips and stacks them in an elegant new design form. HBM chips are tiny when compared to\u00a0graphics double data rate\u00a0(GDDR) memory,\u00a0which it was originally designed to replace. 1GB of GDDR memory chips take up 672 square millimeters versus\u00a0just 35 square millimeters for 1GB of HBM.\nRather than spreading out the transistors, HBM is stacked up to 12 layers high and connected with an interconnect technology called \u2018through silicon via\u2019 (TSV). The TSV runs through the layers of HBM chips like an elevator runs through a building, greatly reducing the amount of time data bits need to travel.\nWith the HBM sitting on the substrate right next to the CPU or GPU, less power is required to move data between CPU\/GPU and memory. The CPU and HBM talk directly to each other, eliminating the need for DIMM sticks.\n\u201cThe whole idea that [we] had was instead of going very narrow and very fast, go very wide and very slow,\u201d Macri said.\nParesh Kharya, senior director of product management for accelerated computing at\u00a0Nvidia, says standard DRAM is not well suited for HPC use. DDR memory can come close to the performance of HBM memory, but \u201cyou\u2019ll have to have a lot of DIMMs, and it\u2019s not going to be optimal\u201d in terms of energy efficiency.\nWhere is HBM being used?\nThe first vendor to use HBM for HPC was Fujitsu, with its Arm-based A64FX processor designed for HPC tasks. The\u00a0Fugaku\u00a0supercomputer powered by the A64FX debuted at the top of the Top 500 list of supercomputers in 2020 and has remained there since.\nNvidia is using HBM3 on its forthcoming Hopper GPU, while the upcoming Grace CPU uses LPDDR5X technology, a DDR derivative. AMD uses HBM2E on its Instinct MI250X accelerator (based on its GPU technology) and Intel plans to use HBM on some of the Sapphire Rapids generation of Xeon server processors, as well as the Ponte Vecchio GPU accelerator for the enterprise.\nWill HBM be used for mainstream applications?\nTechnologies have a history of starting at the bleeding edge and working their way into the mainstream. Liquid cooling started out as a fringe concept, mostly used by gamers trying to squeeze as much performance out of the CPU as possible. Now every server vendor offers liquid cooling for their processors, particularly AI processors.\nSo can HBM memory go mainstream? Macri estimates the price difference between HBM and DDR5 at the same capacity is more than 2 to 1. In other words, 1 GB of HBM costs twice as much as 1 GB of DDR5.\u00a0 So, he reasons, if you\u2019re going to pay that premium for memory, you\u2019re going to want a return on investment.\n\u201cIn a TCO equation, performance is in the denominator, all the costs are in the numerator. So if you get double performance, you improve the TCO by double. So performance is what really is the best way to improve TCO,\u201d he said. He adds that for simplicity of argument, this assumes costs are flat.\nDaniel Newman, principal analyst with Futurum Research, doesn\u2019t expect HBM to go mainstream for two reasons, the first being the cost. \u201cYou\u2019ve got a chicken and egg thing there that if it\u2019s costly to build, then it\u2019s not going to be widely used in a broad market. And so that\u2019s going to reduce the volumes that ship,\u201d he said.\nThe other problem is heat. Now, in addition to a CPU that needs to be cooled, you have five or more memory chips that share the same cooler. \u201cThat means that the processor is dissipating gobs of power, all in a little tiny package, so you\u2019re going to have a heat problem. Every processor that uses HBM has to have extraordinary heat management,\u201d said Newman.\nThe bottom line is if you deploy these accelerators for AI and HPC, expect both results and costs in acquisition and operation to match.\nWill HBM require that applications be rewritten?\nWith this new memory paradigm, the question then becomes, do HPC and AI automatically utilize the full extent of HBM memory or is a re-architecture required? It all depends on how you build your applications the first time, say the experts.\n\u201cOften application developers would work around the limitations of what the system can offer. So sometimes you\u2019ll have to redesign, or have to update your applications to account for the new capabilities that are available,\u201d said Kharya.\nMacri said if an application is memory-bandwidth bound, then it will just go faster with no rewrite needed. If it is memory-latency bound, then it will not go faster other than the intrinsic latency delta between HBM and the memory you are comparing it to. This application would need to be rewritten to remove the dependencies that are causing it to be latency bound.\u00a0\nAlso, he said if the system is loaded down with many applications simultaneously then the HBM system will likely have better performance even if the applications are latency-bound. This is due to the fact that the loaded latency will be lower for HBM.\nKharya agrees that it will depend on how apps were written. If the existing apps worked around various limitations, like memory or latency, then developers will have to \u201credesign or update their applications to account for the new capabilities that are available, which is usual for when any new computing architecture comes along,\u201d he said.\nDoes HBM require a shift from CPUs to GPUs?\nAnother issue is processor architecture. Jim Handy, principal analyst with Objective Analysis, notes that HBM is used with single-instruction, multiple data (SIMD) processors, which are programmed altogether differently than a normal server processor. X86 and Arm are not SIMD, but GPUs are.\n\u201cAny program that already ran on a normal processor would have to be reconfigured and recompiled to take advantage of a SIMD architecture. It\u2019s not the HBM that would change things, but the processor type,\u201d he said.\nHBM technology continues to advance\nThe current version of HBM on the market is HBM2E, but in January, JEDEC released the final spec for HBM3. HBM3 runs at lower temperatures than HBM2E at the same level of operating voltage.\nHBM3 also doubles the per-pin data rate over HBM2 with data rates of up to 6.4Gb\/s. It also doubles the number of \u00a0independent channels from eight to 16, and there are other performance enhancements as well.\nAll of the major memory players\u2014SK Hynix, Samsung, and Micron\u2014are working on HBM3, and products will slowly start coming to market this year, beginning with Nvidia\u2019s Hopper GPU. For now, HBM usage seems to be staying at the high end of performance use cases.\n\u201cThere are a range of workloads that we\u2019ve designed this CPU [Grace] for and it\u2019s not designed to run Excel and Microsoft Office for example, but to shine in the data-center applications space,\u201d said Kharya.