Nvidia is raising its game in data centers, extending its reach across different types of AI workloads with the Tesla T4 GPU, based on its new Turing architecture and, along with related software, designed for blazing acceleration of applications for images, speech, translation and recommendation systems.\nThe T4 is the essential component in Nvidia's new TensorRT Hyperscale Inference Platform, a small-form accelerator card, expected to ship in data-center systems from major server makers in the fourth quarter.\nThe T4 features Turing Tensor Cores, which support different levels of compute precision for different AI applications, as well as the major software frameworks\u00a0\u2013 including TensorFlow, PyTorch, MXNet, Chainer, and Caffe2 \u2013 for so-called deep learning, machine learning involving multi-layered neural networks.\n"The Tesla T4 is based on the Turing architecture, which I believe will revolutionize how AI is deployed in data centers," said Nvidia CEO Jensen Huang, unveiling the new GPU and platform at the company's GTC event in Tokyo Wednesday.\u00a0 "The Tensor Core GPU is a reinvention of our GPU \u2013 we decided to reinvent the GPU altogether."\nThe massively parallel architecture of GPUs make them well-suited for AI. Nvidia GPUs' parallel-computing capabilities are coupled with enough pure processing horsepower to be the technology of choice for AI for a number of years now, particularly in training data sets for machine learning \u2013 essentially creating deep learning neural-network models.\nMultiprecision processing is an advantage for AI inferencing\nThe big step forward for the T4 GPUs and the new inference platform is the ability to do processing at more varying degrees of precision than the prior Nvidia P4 GPUs based on the Pascal architecture.\nOnce neural network models are trained on massive data sets, they are deployed into applications for inferencing \u2014 the classification of data to "infer" a result. While training is compute intensive, inferencing in deployed real-world applications requires as much flexibility as possible from processors.\u00a0\nIdeally, each level of a neural network should be processed with the least precision suitable for that layer, for application speed and power efficiency.\n"By creating an architecture that can mix and match all of these mixed precisions we can maximize accuracy as well as throughput, all at 75 watts," Huang said, adding that T4s are at least eight times faster than P4s and in some cases 40 times faster.\nThe need for inferencing is growing rapidly, as data centers have put into production a wide variety of applications handling billions of voice queries, translations, images and videos, recommendations and social-media interactions. Nvidia estimates that the AI inference industry is poised to grow in the next five years into a $20 billion market. Different applications require different levels of neural network processing.\u00a0\n"You don't want to do 32-bit floating-point calculations if the applications requires 16-bit," said Patrick Moorhead, founder of analyst firm Moor Insights & Strategy. "Nvidia has totally raised the bar in the data center for AI with the new inferencing platform."\nWhat is the TensorRT Hyperscale Inference Platform?\nThe components of the TensorRT Hyperscale Inference Platform, a small 75-watt PCIe form factor, include:\n\nThe Nvidia Tesla T4 GPU, featuring 320 Turing Tensor Cores and 2,560 CUDA (Compute Unitfied Device Architecture) cores. CUDA is Nvidia's programming language for parallel processing. T4 multiprecision capabilities include FP16 (16-bit floating point arithmetic) to FP32,\u00a0 INT8 (8-bit integer arithmetic) and INT16. The T4 is capable of 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4.\nTensorRT 5, an inference optimizer and runtime for deep learning. It's designed for low-latency, high-throughput inference to quickly optimize, validate and deploy trained neural networks for inference in hyperscale data centers, embedded or auotomotive\u00a0 GPU platforms. It supports TensorFlow, MXNet, Caffe2 and Matlab frameworks and other frameworks via ONNX (Open Neural Network Exchange).\nThe TensorRT Inference Server, which Nvidia is making available from\u00a0 its GPU Cloud as an inference server for data-center deployments. It's designed to scale-up both training and inferencing deployment to multicloud GPU clusters, and integrates with Kubernetes and Docker, letting developers automate deployment, scheduling and operation of multiple GPU application containers across clusters of nodes.\n\nSoftware support is key\n"We are continuing to invest and optimize our entire software stack from the bottom and we\u2019re doing so by leveraging the available frameworks so everyone can run their neural networks turnkey out of the box right away \u2013 they can take their training models and turn around and deploy them that very day," said Ian Buck, vice president of Nvidia's Accelerated Computing business unit.\nIn the area of AI inferencing, Nvidia has seen competition from makers of FPGAs (field programmable gate arrays), particularly Xilinx. The programmability of FPGAs lets developers fine-tune the precision of the computation used for different levels of deep neural networks. But FPGAs have posed a steep learning curve for programmers. Customizing FPGAs was done for years via Hardware Description Languages (HDLs), rather than the higher-level languages used for other chips.\nFPGAs offer competition for GPUs\nIn March, Xilinx unveiled what it calls a new product category \u2013 the Adaptive Compute Acceleration Platform (ACAP) \u2013 that will have more software support than traditional FPGAs. The first ACAP version, code-named Everest, is due to ship next year and Xilinx says that software developers will be able to work with Everest using tools like C\/C++, OpenCL, and Python. Everest also can be programmable at the hardware, register-transfer level (RTL) using HDL tools like Verilog and VHDL.\nBut the software support offered by the T4 GPUs coupled with its multiprecision capabilities seem destined to fortify Nvidia's position in both AI training and inferencing.\n"We believe we have the most efficient inferencing platform," Buck said, "We measure ourselves on the real production workloads that we're seeing today and are being seen by our customers\u00a0\u2013 we work with all of them on our stack from top to bottom to make sure we offering not just the best training but now also the best inferencing platform."\nVirtually all the server makers currently using the P4 GPUs will be on T4 by the end of the year, Buck said. At the Tokyo event, support for the T4 was voiced by data-center system makers, including Cisco, Dell EMC, Fujitsu, HPE, IBM, Oracle and SuperMicro.\nIn addition, Google said it would be using the new T4s.