Facebook wrote the load-balancing software, Katran, because existing load balancers can't handle the size of the social media giant's systems. Credit: Alan Carrera Google is known to fiercely guard its data center secrets, but not Facebook. The social media giant has released two significant tools it uses internally to operate its massive social network as open-source code. The company has released Katran, the load balancer that keeps the company data centers from overloading, as open source under the GNU General Public License v2.0 and available from GitHub. In addition to Katran, the company is offering details on its Zero Touch Provisioning tool, which it uses to help engineers automate much of the work required to build its backbone networks. This isn’t Facebook’s first foray into open-sourcing the software that runs its network. Last month, the company open-sourced PyTorch, the software used for its artificial intelligence (AI) and machine learning projects. PyTorch is a Python-based package for writing tensor computation and deep neural networks using GPU acceleration. Facebook has to develop these kinds of software packages because while there are plenty of off-the-shelf software products out there, none of them is made for a global social media company that has 2 billion users. Details of Facebook’s load-balancer tool The news came from a blog post written by Facebook production engineer Nikita Shirokov and software engineer Ranjeeth Dasineni. The two said the company had previously built its own load-balancing software, primarily from open-source software. It served them well for four years, but it was beginning to show its age and limitations. They wrote that a load balancer has to meet four criteria: It has to run on commodity Linux servers; coexist with other services on a given server, eliminating the need for dedicated load balancing servers; allow low-disruption maintenance; and offer easy instrumentation and debugging. Shirokov and Dasineni said their first software-defined load balancer, called Layer 4 Load Balancer or L4LB for short, fell short when it came to the coexistence with other services criteria, specifically the backends. “In the second iteration, we leveraged the eXpress Data Path (XDP) framework and the new BPF virtual machine (eBPF) to run the software load balancer together with the backends on a large number of machines,” they wrote. Details of Facebook’s Zero Touch Provisioning tool Details behind Facebook’s Zero Touch Provisioning tool also came in the form of a blog post written about three weeks ago by a number of Facebook engineers. Zero touch provisioning (ZTP) allows you to provision new switches and routers in your network automatically, no manual intervention required. For a company the size of Facebook, it has to build its own networks, which is why it needs a ZTP tool. And like load balancing, it found existing ZTP tools to be inadequate to handle the kind of scale Facebook operates on. “Ultimately, these challenges drove Facebook’s network engineers to develop a completely new approach for network deployment workflows,” the blog authors said. So, it created a new framework it called Vending Machine, only here instead of inserting a dollar and getting a can of soda, the input is a device role, location, and platform — and out pops a freshly provisioned network device, ready to deliver production traffic. ZTP is still evolving, and Facebook is adding new features and functions to it. Facebook has not disclosed the license for ZTP. And like Katran, if you decide to use it, you’re on your own. A friendly Facebook engineer might be able to answer some questions, but this is not supported software. So, I’m curious to see who might actually use this software, since this isn’t exactly SMB material. It’s meant for large-scale enterprises, and I would think most of them have their own platforms. But I could be wrong. Related content news AI partly to blame for spike in data center costs Low vacancies and the cost of AI have driven up colocation fees by 15%, DatacenterHawk reports. By Andy Patrizio Nov 27, 2023 4 mins Generative AI Data Center opinion Winners and losers in the Top500 supercomputer ranking Besides Nvidia, who had a great showing on the list of the world’s most powerful supercomputers? Almost everyone. By Andy Patrizio Nov 20, 2023 4 mins CPUs and Processors Data Center news High CPU temps are here to stay The nature of their design makes CPUs run hotter than ever, and one AMD executive says heat density is unlikely to decrease with future chips. By Andy Patrizio Nov 17, 2023 4 mins CPUs and Processors Data Center news Intel updates HPC processor roadmap Next generation Xeon and Gaudi are among the announcements. By Andy Patrizio Nov 15, 2023 3 mins CPUs and Processors Data Center Podcasts Videos Resources Events NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe