Cisco is on a mission to make sure Ethernet is the chief underpinning for artificial intelligence networks now and in the future.\nIt has been a huge contributor to Ethernet development in the IEEE and other industry groups over the years, and now it\u2019s\u00a0one of the core vendors driving the Ultra Ethernet Consortium (UEC), a group that\u2019s working to develop physical, link, transport and software layer advances for Ethernet to make it more capable of supporting AI infrastructures.\n\u201cOrganizations are sitting on massive amounts of data that they are trying to make more accessible and gain value from faster, and they are looking at AI technology now,\u201d said Thomas Scheibe, vice president of product management with Cisco\u2019s cloud networking, Nexus & ACI product line.\n\u201cCustomers want to know what they need to do now on the networking side to be able to run the huge clusters of GPUs they need and handle the volumes of data they create. And for most customers, it's going to be Ethernet,\u201d Scheibe said.\nTo that end, Cisco has put together a blueprint defining how organizations can use existing data center Ethernet networks to support AI workloads now.\nAdvancing Nexus 9000 features\nA core component of Cisco\u2019s AI blueprint is its Nexus 9000 data center switches, which support up to 25.6Tbps of bandwidth per ASIC and \u201chave the hardware and software capabilities available today to provide the right latency, congestion management mechanisms, and telemetry to meet the requirements of AI\/ML applications,\u201d Cisco wrote in its Data Center Networking Blueprint for AI\/ML Applications. \u201cCoupled with tools such as Cisco Nexus Dashboard Insights for visibility and Nexus Dashboard Fabric Controller for automation, Cisco Nexus 9000 switches become ideal platforms to build a high-performance AI\/ML network fabric.\u201d\nTwo technologies that enable Nexus AI-based networking are the switch\u2019s NX-OS operating system support for Remote Direct Memory Access Over Converged Ethernet, version 2 (ROCEv2) and Explicit Congestion Notification (ECN), Scheibe said.\nROCEv2 is a high-performance network computing technology that lets data transfer directly between the memory of two devices without having to involve a server CPU.\u00a0It allows multiple packets to be transferred or routed simultaneously over a single connection, reducing latency and complexity as well as boosting throughput.\nECN essentially enables a lossless Ethernet network by monitoring for network congestion or other situations where packets could get dropped and throttling back the network to ensure that doesn\u2019t happen. Lossless Ethernet networks are not only a key requirement\u00a0for AI networking but also for today\u2019s VOIP or video environments, Scheibe noted.\nAnother tool, Priority Flow Control, can help control congestion in Layer 3-based networks and plays an important role in overall congestion management.\nTaken together, these technologies can give an Ethernet network the ability to prioritize certain sets of workloads \u2013 such as AI workloads that cannot tolerate any dropped packets and will always get network priority even if there\u2019s congestion, Scheibe said.\n\u201cThese technologies can be implemented in Nexus networks today, and customers can tune their environments to handle their workload mix,\u201d Scheibe said. \u201cThere is ongoing work to handle larger and more AI workloads, and there are other techniques that can be used to make sure customers can easily distribute them across available bandwidth.\u201d\nCisco has also published scripts so customers can automate specific settings across the network to set up this fabric and simplify configurations, Scheibe said.\nIn addition, Nexus 9000 switches come with built-in telemetry capabilities that can be used to correlate issues in the network and help optimize it for RoCEv2 transport, Cisco stated.\u00a0\n\u201cThe Cisco Nexus 9000 family of switches provides hardware flow telemetry information through flow table and flow table events. With these features, every packet traversing the switch can be accounted for, observed, and correlated with behavior such as micro-bursts or packet drops,\u201d Cisco wrote. Customers can export this data to the Cisco Nexus Dashboard Insights management package and show the data per-device, per-interface, down to per-flow level granularity, according to Cisco.\nBeyond the Nexus 9000\nAnother element of Cisco\u2019s AI network infrastructure is its new high-end programmable Silicon One processors, which are aimed at large-scale AI\/ML infrastructures for enterprises and hyperscalers.\nCisco added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its now 13-member Silicon One family. The processors can be customized for routing or switching from a single chipset, eliminating the need for different silicon architectures for each network function. This is accomplished with a common operating system, P4 programmable forwarding code, and an SDK.\nThe new devices, positioned at the top of the Silicon One family, will bring networking enhancements that make them ideal for demanding AI\/ML deployments or other highly distributed applications, Cisco said.\nCore to the Silicon One system is its support for enhanced\u00a0Ethernet\u00a0features, such as improved flow control, congestion awareness, and avoidance.\nThe system also includes advanced load-balancing capabilities and \u201cpacket-spraying\u201d that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link-failure recovery also helps ensure the network operates at peak efficiency, according to Cisco.\nCombining these enhanced Ethernet technologies and taking them a step further ultimately lets customers set up what Cisco calls a Scheduled Fabric.\u00a0In a Scheduled Fabric, the physical components \u2013 chips, optics, switches \u2013 are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior and much higher bandwidth throughput, especially for flows like AI\/ML, Cisco said.\nData-center sustainability focus\nWhile AI seems all-encompassing these days, there are other topics that are challenging data center network operators.\nFor example, customers are looking to efficiently expand existing data center networks to handle larger workloads, so they want to find the best way to integrate 400G into the network, Scheibe said.\u00a0\nTwo other major challenges are reducing data center power consumption and increasing sustainability practices, Scheibe said.\n\u201cOrganizations are looking for help on getting a baseline on how much power they are using and learning what their current carbon footprint is so they can make informed decisions on how to move forward,\u201d Scheibe said.\nCisco Nexus Cloud offers a Network Energy Utilization service that gives customers an idea of a data center\u2019s environmental impact.\u00a0\nRecently, Cisco announced that the Nexus Dashboard will provide real-time and historical insights for power consumption of all IT equipment in the data center and estimate the energy footprint of data center operations.\u00a0\nNexus Dashboard will also provide AI Data Center Blueprint for Networking, which will offer enterprises looking to develop AI-based applications a way to set up their networks to handle the additional transaction load. For example, it will detail how to implement InfiniBand-to-Ethernet network migrations and large-scale machine-learning fabrics.