At its GPU Technology Conference this week, Nvidia took the wraps off a new DGX-2 system it claims is the first to offer multi-petaflop performance in a single server, thus greatly reducing the footprint to get to true high-performance computing (HPC).
DGX-2 comes just seven months after the DGX-1 was introduced, although it won’t ship until the third quarter. However, Nvidia claims it has 10 times the compute power as the previous generation thanks to twice the number of GPUs, much more memory per GPU, faster memory, and a faster GPU interconnect.
The DGX-2 uses a Tesla V100 CPU, the top of the line for Nvidia’s HPC and artificial intelligence-based cards. With the DGX-2, it has doubled the on-board memory to 32GB. Nvidia claims the DGX-2 is the world’s first single physical server with enough computing power to deliver two petaflops, a level of performance usually delivered by hundreds of servers networked into clusters.
How DGX-2 compares to Intel's Skylake Xeon
By way of comparison, Nvidia said, to get similar performance out of Intel’s latest Skylake Xeon generation, you would need a $3 million system consisting of 15 racks of servers and 300 CPUs. DGX-2 starts at $399,000 and is 60 times smaller and 18 times more power efficient than the Skylake setup, the company said.
(Of course, that’s never how it works with HPC, is it? Offer them the same performance in a quarter of the space, and HPC centers will simply fill the space with new equipment for four times the performance. HPC, and to a lesser degree AI, isn’t about energy efficiency, it’s primarily focused on performance. More performance in less space simply means cramming more performance into the same space.)
Nvidia's super switch
That new interconnect might be the real secret sauce. Fed up with the slow development pace of PCI Express, Nvidia came out with its own interconnect, called NVlink, in 2016. However, it was limited to linking just eight GPUs. If you wanted to connect any more, you had to go over Infiniband, which was slower than NVLink and caused latency.
So, Nvidia came up with the NVSwitch chip, which connects all of the 16 GPUs in the DGX-2 box with a fabric that has five times more bandwidth than the top PCIe switch on the market, the company said. A single switch has 18 full bandwidth ports for an aggregate of 900GB/sec of bidirectional bandwidth.
IBM already uses NVLink in its POWER9 RISC-based servers and will likely license NVSwitch, as well. It will be very interesting to see who else comes on board as the industry tires of waiting for the PCI Express SIG to get into gear.
As it is, the industry is really rallying around the Tesla GPU for HPC and AI. At the show, Cray, HPE, IBM, Lenovo, Supermicro, and Tyan all announced they will begin rolling out new Tesla V100 32GB systems within the second quarter, and Oracle Cloud Infrastructure announced plans to offer Tesla V100 32GB in the cloud in the second half of the year.
Nvidia software updates
The news from Nvidia isn’t all silicon. The company has also announced updates to its AI and machine learning software stack. The company announced a new version of its TensorRT inference software that is integrated with Google’s TensorFlow framework. The company claims up to 190 times faster deep learning inference for applications vs. CPUs.
Finally, Nvidia announced a partnership with ARM Holdings, maker of the dominant mobile processor design everyone uses, to combine Nvidia’s deep learning accelerator framework with ARM’s machine learning platform.
The goal is to make it easy for ARM licensees that are developing IoT apps to integrate AI into their designs and make AI and machine learning widely available in smart and connected devices.