Schneider Electric is warning the demands of power and cooling for AI are beyond what standard data center designs can handle and says new designs are necessary.\nThat may be expected from a company like Schneider, which makes power and cooling systems used in data centers. But it doesn\u2019t mean Schneider isn't correct. AI is a different kind of workload than standard server-side applications, such as databases, and the old ways just don\u2019t cut it anymore.\nSchneider's\u00a0white paper notes that AI needs ample supply of three things: power, cooling, and bandwidth. GPUs are the most popular AI processors and the most power intensive. Whereas CPUs from Intel and AMD draw about 300 to 400 watts, Nvidia\u2019s newest GPUs draw 700 watts per processor and they are often delivered in clusters of eight at a time.\nThis leads to greater rack density. In the past, rack density of around 10kW to 20kW was standard and easily addressed by air cooling (heatsinks and fans). But anything over 30kW per rack means that air cooling is no longer a viable option for cooling. At that point, liquid cooling has to be taken into consideration, and liquid cooling is not an easy retrofit.\n\u201cAI start-ups, enterprises, colocation providers, and internet giants must now consider the impact of these densities on the design and management of the data center physical infrastructure,\u201d the authors of the paper wrote.\nSchneider projects that the total cumulative data center power consumption worldwide will be 54GW this year and hit 90GW by 2028. In that time, AI processing will go from accounting for 8% of all power use this year to 15% to 20% by 2028.\nWhile power and cooling has been top of mind among data center builders, another consideration often overlooked is network throughput and connectivity. For AI training, each GPU needs its own network port with very high throughput.\nHowever, GPUs have greatly outpaced network ports. For example, using GPUs that process data from memory at 900 Gbps with a 100 Gbps compute fabric would slow the GPU down because it has to wait for the network to process all of the data. Alternatively, InfiniBand is much faster than traditional copper wires, but it\u2019s also 10 times more expensive.\nOne approach to avoid heat density is to physically spread out the hardware. Don\u2019t fill the racks, physically separate them, and so on. But doing so introduces latency given the many terabytes of data that have to be moved around, and latency is the enemy of performance.\nSuggestions and solutions\nSchneider offers a number of suggestions. The first calls for replacing 120\/280V power distribution with 240\/415V systems to reduce the number of circuits within high-density racks. It also recommends multiple power distribution units (PDU) to deliver adequate power.\nSetting a threshold of 20kW per rack for air cooling is another suggestion. Going beyond 20kW, Schneider recommends using liquid cooling. Given that air cooling maxes out at 30kW, I believe Schneider is being a bit conservative about the limits of air cooling. Or trying to sell liquid cooling hardware.\nThere are multiple forms of liquid cooling, but Schneider advocates direct liquid cooling. A copperplate is connected to the CPU just like with an air cooled system, but it has two pipes: cool water comes in one pipe, absorbs the heat, and exits via the other pipe, where it is circulated and cooled down.\nSchneider doesn't seem to be a fan of immersion cooling, as the dialectic liquids used for immersion contain fluorocarbons which may be polluting.\nSchneider also warns that there is a general lack of standardization in liquid cooling, so a thorough infrastructure assessment \u2013 done by experts experienced with the equipment \u2013 is important. That\u2019s assuming that a facility can even be retrofitted in the first place. Most data centers using liquid cooling add the infrastructure when the center is being built, not afterwards.\nThere are a number of other recommendations and guidance included in the white paper.