At the MIT EmTech Digital conference, startup Nervana announced plans to design and build a custom ASIC processor for neural networks and machine learning applications that the company’s CEO, Naveen Rao, claims will run 10 times faster than graphic processor units (GPU).
The news comes after Google last week announced it had secretly deployed its neural network and machine-learning-tailored processors in its data centers about a year ago. The company reported that its custom processor had improved performance by an order of magnitude. Google’s approach and improvements in performance validate Nervana’s technical strategy.
+ More on Network World: +
GPUs have become synonymous with machine learning. Interest in machine learning exploded a few years ago when Alex Krizhevsky, a student of artificial intelligence (AI) luminary Geoff Hinton at the University of Toronto, proved that machine learning systems could be trained on economically priced GPU hardware. Krizevsky programmed a massively parallel GPU board to solve deep learning problems after he recognized that GPUs could be repurposed to accelerate neural network vector mathematics calculations. The application of GPUs in the hyperscale mobile market has made these processors cheap and effective, but not optimized for machine learning, according to Rao.
Nervana’s custom ASIC, called the Nervana Engine, has 32GB of memory packaged with each processor module. Within the processing module, on-chip storage memory interconnect transfer rates are 8Tbps. The processor modules are interconnected in a supercomputer-like Torus configuration and can transfer data memory to memory at 2.4Tbps.
Rao said the training of machine learning systems is 10 ten times faster because much larger data models can be loaded into memory and processed in parallel. Rao quoted Antoine de Saint-Exupery while explaining the Nervana Engine’s architecture, “Perfection is attained not when there is nothing more to add, but when there is nothing more to remove.” The instruction set was reduced to a set of primitives optimized for machine learning much how RISC processors were designed with fewer instructions. Because neural network programs prescribe operations and memory access, the managed memory cache hierarchy used in GPUs was eliminated, speeding execution and opening more die space.
The Nervana Engine will be fabricated by TSMC with a 28nm device size, due for delivery in early 2017. Rao says a subsequent shrink to a 16nm device size could double performance.
Neon gives Nervana engineers more control, higher performance
Nervana developed its own Python-based machine learning libraries called Neon that are optimized for neural network applications, such as machine translation, image classification, object localization, text analysis and video indexing. Neon currently runs on Nvidia GPUs that run proprietary microcode. With engineering control over the application layer and microcode layer, Nervana engineers have optimized execution times. When the Nervana Engine is released, it will be able to optimize all three layers: application, microcode and hardware.
Nervana has published benchmarks of Neon running on Nvidia GPUs—by Facebook researcher Soumith Chintala—to claim first place in machine learning performance. When Nervana Engines are delivered the benchmarks, Rao says performance will improve by an order of magnitude.
Nervana offers Neon running on Nvidia GPUs as a cloud service to customers, which includes Monsanto. These workloads will be shifted to the Nervana Engine in 2017.
The last machine learning mile
The quantity of machine learning experts and programmers doesn’t approach the number needed by enterprises to solve high-value problems with machine learning. The Economist recently published a report about the competition for AI experts between Silicon Valley’s marquee companies and academia. In the competition for technical talent, enterprises are a distant third choice for greatly sought after experts.
Nervana is delivering the machine learning last mile with professional services that augment enterprise IT departments, data scientists and statisticians.