Data centers workloads once handled by IBM mainframes and Sun servers were commoditized by Intel PC hardware, driven by cloud companies like Google. The belief held by the tech industry, including Andreessen Horowitz VC Ben Evans up until recently, that this would continue forever changed last week when Google released a detailed research paper about performance and architectural details about its Tensorflow Processing Unit (TPU).
An advertising, cloud services and software company breaking from its core business raises the question: Why are Intel, Qualcomm and NVIDIA not meeting Google’s data center needs?
+ Also on Network World: What AI can and cannot do today +
The TPU is not a general-purpose device like an Intel CPU or Nvidia GPU. It is an application-specific integrated circuit (ASIC) designed for machine learning, a specialized subfield of artificial intelligence. In the past few years machine learning has emerged to accurately translate language, recognize images, recommend everything from a consumer’s next book to their next restaurant meal and other types of things that with Google’s large trove of data can be predicted accurately.
There are two parts to machine learning: training and inference. Training really is programming a computer with data. Training is done by feeding millions of sentences in multiple languages into the machine learning system to teach it to translate from one language to others. Speaking or typing a sentence in one language into a smartphone to translate it into another language is inference. Both training and inference run on neural networks, which are a software layer optimized for machine learning on top of the data center hardware.
The TPU is designed for the application-specific task of inference, which is a large and increasing part of Google’s workload. The machine learning workload grew as the formerly esoteric science of machine learning research performed by the Google Brain group was applied to a few Google services, then to many. The load grew even more as Google Brain’s know-how was distributed to developers in product groups who added AI to more services.
This increasing workload precipitated Google’s TPU. The processing capacity of inference models such as language translation and image search are user-facing, requiring low-cost and low-latency performance. Inference was the right place to start optimizing the data center capacity for neural network workloads.
Here are some of the reasons why Google designed and built the TPU.
1. Performance
Google compared the TPU performance to a server-class Intel Haswell CPU and a Nvidia K80 GPU running benchmark code representative of 95 percent of the inference workload. Running neural network inference, the TPU is 15 to 30X faster than Nvidia’s GPUs and Intel’s CPUs.
2. Physical space
Cloud data centers are the equivalent of IT factories. Budgeting includes equipment, real estate, power and the cost of building the data center. The planning goal is to minimize all the costs by packing as much processing power into the space that draws the least power and generates the minimum amount of heat. The capital and operating expenses of Google’s gargantuan infrastructure for its information processing factories is a big and growing budget line item.
Six years ago, when users were first starting to use natural language recognition in place of their smartphone keyboards, Google engineers estimated that three minutes of natural language input per user per day would double the number of data centers using the Intel and Nvidia designs that were deployed at the time.
3. Power consumption
Faster chips without an accompanying reduction in power consumption would only affect the cost of the physical space. Reducing power consumption has a double impact because it reduces the amount of power consumed and reduces the cooling cost to dissipate the heat generated during processing. More than just raw performance, the combination of the TPU with a CPU host processor is more impactful. The chart below compares the TPU/CPU per watt performance and shows a 30 to 80 times improvement over alternative technology configurations with CPUs and GPUs under different workloads.
4. The TPU solves an application-specific problem
Intel’s CPUs and Nvidia’s GPUs are general-purpose systems on a chip (SoC) that are designed for a wide range of applications, particularly floating point operations for precise calculations. Machine learning models are tolerant of low-precision mathematical operations, eliminating the need for a floating point processing unit (FPU). Compared to Intel and Nvidia’s SoCs with FPUs, the accuracy of predictions made using the TPU’s 8 bit mathematical operations powering inference models is equal.
Matrix algebra mathematics comprise most neural network operations. The Matrix Multiply Unit (MMU) is the heart of the TPU. It contains 256x256 multiplier–accumulators (MAC) performing 8-bit multiply-and-adds. The MMU can perform 64,000 accumulates per cycle. The TPU clocked at 0.7GHz achieved significant performance gains over the Intel SoC , which clocked at 2.3GHz, and the 1.5GHz Nvidia SoC by optimizing low-precision matrix mathematics and quickly moving the data and results in and out of the MMU. Google inferred in its research paper that an increase in bus bandwidth in a future TPU redesign could double or triple performance.
5. Leading and prodding chip makers to build a TPU
The authors of Google’s research paper said: “Order-of-magnitude differences between commercial products are rare in computer architecture, which may lead to the TPU becoming an archetype for domain-specific architectures. We expect that many will build successors that will raise the bar even higher." The engineering team lead by luminary chip engineer Norman Jouppi delivered the TPU in just 15 months. An impressive time table because ASICs are a big financial commitment to fabricate, and once built, if an error is discovered in data center production, the expensive fabrication process is duplicated.
Intel could build a better TPU, though, with its greater resources, more architects, more design engineers and chip fabrication facilities. Nvidia, except for owning fabrication facilities, has the same resources. Intel, Nvidia and other chip makers have been sitting on the fence waiting for the AI/machine learning chip market to become large enough to deserve their investment and attention. The market right now has just a few very big customers, including Amazon, Google, Facebook, IBM, Microsoft and a handful of other companies. While individually large, the chip makers are still on the sidelines when compared the market for general purpose CPUs.
The hardware business is not really part of Google’s strategy. It is an advertising, cloud services and software company. But Google strategically builds early-edition hardware to make a point, such as they did with Google Home, the Pixel and the Chromebook.
Google is a lead user that understands the machine learning problem well enough to build a solution. And after two years of operation in its data centers, it proved that the problem has been solved. Google’s release of this research paper is intended to raise the level of discussion amongst the machine learning community and the chip makers that it is time for an off-the-shelf merchant solution for running inference at scale.
6. Patents and intellectual property to trade
Searching the U.S. patent office’s database for patent application under patent inventor Jouppi produces a number of TPU related patents. Patents can be used offensively and defensively, as Samsung and Apple have proved. As a lead user owning patent, Google could incentivize a chip maker’s entry into the business using its patents as currency.
Leading machine learning user companies, such as Amazon, Facebook, Google, IBM and Microsoft, are waiting to welcome the merchant chip makers’ salesmen and place orders for machine learning-specific SoCs. It is a chicken-and-egg problem for lead users. They need new and faster computing architectures to advance the whole industry and fuel the adoption of AI by the many enterprises. And the chip makers are waiting for more AI customers. Google’s TPU could change this.