At its GPU technology conference (GTC) last year, Nvidia announced it would come out with its own server chip called Grace based on the Arm Neoverse v9 server architecture. At the time, details were scant, but this week Nvidia revealed the details, and they are remarkable.\nWith Grace, customers have two options, both dubbed superchips by Nvidia. The first is the Grace Hopper Superchip that was formally introduced last year, but only broadly described. It consists of a 72-core CPU, and a Hopper H100 GPU tightly connected by Nvidia\u2019s new high-speed NVLink-C2C chip-to-chip interconnect, which has 900GB\/s of transfer speed.\n\nThe second, announced this week, is the Grace CPU Superchip, which has no GPU. Instead, it has two 72-core CPUchips tied together via NVLink. Even without the H100 GPU, the Grace CPU Superchip has some pretty good benchmarks. Nvidia claims SPECrate2017_int_base performance of more than 1.5x higher compared to the dualhigh-end AMD Epyc \u201cRome\u201d generation processors already shipping with Nvidia's DGX A100 server.\nThe two superchips will serve two different markets, according to Paresh Kharya, senior director of product management and marketing at Nvidia. The Grace Hopper Superchip is intended to address the giant scale of AI and HPC, with focus on the bottleneck of CPU system memory, he said.\n\u201cBandwidth is limited, and when you connect the CPU and GPU in a traditional server. the flow of data from the system memory to the GPU is bottlenecked by the PCIe slot," he said. "So by putting the two chips together and interconnecting them with our NVLink interconnect, we can unblock that memory.\u201d\nBoth the Grace CPU Superchip and Grace Hopper Superchip eschew standard DRAM memory sticks in favor of a new memory technology that Nvidia calls LPDDR5X. The memory is on the chip die and physically right next to the chips themselves, rather than on memory sticks in DIMM slots. This direct connection offers up to 1TB\/s of bandwidth while supporting in-memory error correction. Kharya said that memory performance is up to 30 times faster than Nvidia\u2019s current Ampere technology, which uses traditional DIMM memory.\nWith the Grace CPU Superchip, Nvidia has a different emphasis. First, it put both the CPUs as well as the LPDDR5X memory in a single package with a 500-watt power draw, which he says is twice as energy efficient as leading CPUs. It may be more than that. A dual socket x86 server will easily exceed 500 watts, and have nowhere near as many cores. And that doesn\u2019t take into account the power draw of the memory.\nThe memory bandwidth of the Grace CPU Superchip will benefit a range of applications that are not yet accelerated for GPUs.\nAnother potential market for the Grace CPU Superchip is AI inference. Some inference tasks require a lot of pre- and post-processing that needs to happen on the CPU and some other parts of the application are processed on the GPU. He also cited data analytics as a big potential market since.\n\u201cThere's a long tail of applications that have not yet been accelerated on GPUs. Those would immediately benefit. They will really like the high-memory bandwidth to process faster as well as the speed of the CPU cores,\u201d said Kharya.\nNvidia said Grace CPU Superchip and Grace Hopper Superchip should ship by the end of this year or the beginning of next year.