At its GPU Technology Conference this week, Nvidia took the wraps off a new DGX-2 system it claims is the first to offer multi-petaflop performance in a single server, thus greatly reducing the footprint to get to true high-performance computing (HPC).\nDGX-2 comes just seven months after the DGX-1 was introduced, although it won\u2019t ship until the third quarter. However, Nvidia claims it has 10 times the compute power as the previous generation thanks to twice the number of GPUs, much more memory per GPU, faster memory, and a faster GPU interconnect.\n\nThe DGX-2 uses a Tesla V100 CPU, the top of the line for Nvidia\u2019s HPC and artificial intelligence-based cards. With the DGX-2, it has doubled the on-board memory to 32GB. Nvidia claims the DGX-2 is the world\u2019s first single physical server with enough computing power to deliver two petaflops, a level of performance usually delivered by hundreds of servers networked into clusters.\nHow DGX-2 compares to Intel's Skylake Xeon\nBy way of comparison, Nvidia said, to get similar performance out of Intel\u2019s latest Skylake Xeon generation, you would need a $3 million system consisting of 15 racks of servers and 300 CPUs. DGX-2 starts at $399,000 and is 60 times smaller and 18 times more power efficient than the Skylake setup, the company said.\n(Of course, that\u2019s never how it works with HPC, is it? Offer them the same performance in a quarter of the space, and HPC centers will simply fill the space with new equipment for four times the performance. HPC, and to a lesser degree AI, isn\u2019t about energy efficiency, it\u2019s primarily focused on performance. More performance in less space simply means cramming more performance into the same space.)\nNvidia's super switch\nThat new interconnect might be the real secret sauce. Fed up with the slow development pace of PCI Express, Nvidia came out with its own interconnect, called NVlink, in 2016. However, it was limited to linking just eight GPUs. If you wanted to connect any more, you had to go over Infiniband, which was slower than NVLink and caused latency.\nSo, Nvidia came up with the NVSwitch chip, which connects all of the 16 GPUs in the DGX-2 box with a fabric that has five times more bandwidth than the top PCIe switch on the market, the company said. A single switch has 18 full bandwidth ports for an aggregate of 900GB\/sec of bidirectional bandwidth.\nIBM already uses NVLink in its POWER9 RISC-based servers and will likely license NVSwitch, as well. It will be very interesting to see who else comes on board as the industry tires of waiting for the PCI Express SIG to get into gear.\nAs it is, the industry is really rallying around the Tesla GPU for HPC and AI. At the show, Cray, HPE, IBM, Lenovo, Supermicro, and Tyan all announced they will begin rolling out new Tesla V100 32GB systems within the second quarter, and Oracle Cloud Infrastructure announced plans to offer Tesla V100 32GB in the cloud in the second half of the year.\nNvidia software updates\nThe news from Nvidia isn\u2019t all silicon. The company has also announced updates to its AI and machine learning software stack. The company announced a new version of its TensorRT inference software that is integrated with Google\u2019s TensorFlow framework. The company claims up to 190 times faster deep learning inference for applications vs. CPUs.\nFinally, Nvidia announced a partnership with ARM Holdings, maker of the dominant mobile processor design everyone uses, to combine Nvidia\u2019s deep learning accelerator framework with ARM\u2019s machine learning platform.\nThe goal is to make it easy for ARM licensees that are developing IoT apps to integrate AI into their designs and make AI and machine learning widely available in smart and connected devices.