SANTA CLARA – Microsoft’s Azure couldn’t scale without SDN.
The Microsoft cloud, through which the company’s software products are delivered, has 22 hyper-scale regions around the world. Azure storage and compute usage is doubling every six months, and Azure lines up 90,000 new subscribers a month.
Fifty-seven percent of the Fortune 500 use Azure and the number of hosts quickly grew from 100,000 to millions, said CTO Mark Russinovich during his Open Network Summit keynote address here this week. Azure needs a virtualized, partitioned and scale-out design, delivered through software, in order to keep up with that kind of growth.
“When we started to build these networks and started to see these types of requirements, the scale we were operating at, you can’t have humans provisioning things,” Russinovich said. “You’ve got to have systems that are very flexible and also delivering functionality very quickly. This meant we couldn’t go to the Web and do an Internet search for a scalable cloud controller that supports this kind of functionality. It just didn’t exist.”
+MORE ON NETWORK WORLD: Microsoft taps partners to sell Azure and take on Amazon in the cloud+
Microsoft wrote all of the software code for Azure’s SDN. A description of it can be found here.
Microsoft uses virtual networks (Vnets) built from overlays and Network Functions Virtualization services running as software on commodity servers. Vnets are partitioned through Azure controllers established as a set of interconnected services, and each service is partitioned to scale and run protocols on multiple instances for high availability.
Controllers are established in regions where there could be 100,000 to 500,000 hosts. Within those regions are smaller clustered controllers which act as stateless caches for up to 1,000 hosts.
Microsoft builds these controllers using an internally developed Service Fabric for Azure. Service Fabric has what Microsoft calls a microservices-based architecture that allows customers to update individual application components without having to update the entire application.
Microsoft makes the Azure Service Fabric SDK available here.
Much of the programmability of the Azure SDN is performed on the host server with hardware assist. A Virtual Filtering Platform (VFP) in Hyper-V hosts enable Azure’s data plane to act as a Hyper-V virtual network programmable switch for network agents that work on behalf of controllers for Vnet and other functions, like load balancing.
Packet processing is done at the host where a NIC with a Field Programmable Gate Array offloads network processing from the host CPU to scale the Azure data plane from 1Gbps to 40Gbps and beyond. That helps retain host CPU cycles for processing customer VMs, Microsoft says.
Remote Direct Memory Access is employed for the high-performance storage back-end to Azure.
Though SDNs and open source go hand-in-hand, there’s no open source software content in the Azure SDN. That’s because the functionality required for Azure was not offered through open source communities, Russinovich says.
“As these requirements were hitting us, there was no open source out there able to meet them,” he says. “And once you start on a path where you’re starting to build out infrastructure and system, even if there’s something else that comes along and addresses those requirements the switching cost is pretty huge. It’s not an aversion to it; it’s that we haven’t seen open source out there that really meets our needs, and there’s a switching cost that we have to take into account, which will slow us down.”
Microsoft is, however, considering contributing the Azure Service Fabric architecture to the open source community, Russinovich said. But there has to be some symbiosis.
“What’s secret sauce, what’s not; what’s the cost of contributing to open source, what’s the benefit to customers of open source, what’s the benefit to us penetrating markets,” he says. “It’s a constant evaluation.”
Some of the challenges in constructing the Azure SDN were retrofitting existing controllers into the Service Fabric, Russinovich says. That resulted in some scaling issues.
“Some of the original controllers were written not using Service Fabric so they were not microservice oriented,” he says. “We immediately started to run into scale challenges with that. Existing ones are being (rewritten) onto Service Fabric.
“Another one is this evolution of the VFP and how it does packet processing. That is not something that we sat down initially and said, ‘it’s connections, not flows.’ We need to make sure that packet processing on every packet after the connection is set up needs to be highly efficient. It’s been the challenge of being able to operate efficiently, scale it up quickly, being able to deliver features into it quickly, and being able to take the load off the server so we can run VMs on it.”
What’s next for the Azure SDN? Preparing for more explosive growth of the Microsoft cloud, Russinovich says.
“It’s a constant evolution in terms of functionality and features,” he says. “You’re going to see us get more richer and powerful abstractions at the network level from a customer API perspective. We’re going to see 10X scale in a few years.”