Martin Casado is not your average executive, nor is he your run-of the mill computer scientist. First, his graduate research led to the creation of OpenFlow and the massive transformational revolution it is creating across all domains of networking. Not content with one major revolution, Casado's work led his team to revolutionize the way networking is done in the hypervisor with their open vSwitch.
While networking is not as center-stage as cloud computing, from my view Mr. Casado looks a lot like the Larry Page or Sergey Brin of the networking industry. As 'the cloud' takes shape, we have seen a new guard of technology-savvy executives and business-savvy technologists that 'get' the cloud and are laying the groundwork for the new era. Equal parts strategist and scientist, if looking for the direction the networking industry will evolve in the Cloud era, I cant think of anyone who has had a greater impact than Martin Casado.
While there may have been some debate prior, VMware's Nicira acquisition solidifies the new prominence of hypervisor networking. While this area has been neglected, this fall it will become the centerpiece of enterprise infrastructure with Microsoft gearing up to launch a major attack on VMware with the launch of Windows Server 2012.
And Microsoft ain't joking with the network virtualization stack it will be including with the upcoming launch. I had the chance to take a preview and have been duly impressed with its ability to virtualize complex topologies and provide a robust framework for virtual service integration. Both of these software titans are now poised to give networking software and automation a LONG-overdue facelift. By this time next year we will all be up to our necks in advanced hypervisor networking ... its going to hit the industry like a bat outta hell.
But how will the hypervisor network take shape? How will it affect the future of L4-7 services in the Data Center? How will it affect the future of the physical network? These are themes of my conversation with Martin below.
Art Fewell: In many enterprises today hypervisor networking hasn't been a central point of focus. It seems VMware has been hesitant to take on Cisco for providing the rich network services the network access layer is uniquely positioned to provide. While they haven't been aggressive, the way I see it, if hypervisor vendors want it, the hypervisor network is their space to take. And when I saw this acquisition I thought ... 'okay, VMware is really serious about this space.'
Martin Casado: I think it's very clear that we're entering a world with two types of networks. You've got the physical network, which is really solving the problem of how you move packets between point A and point B in a complex graph, whether that graph is a fabric or it's a backbone. And that's a physical networking problem that requires boxes and wires and routing protocols. And this is something that the traditional vendors are fantastic at.
But now we've got this virtual network, which is its own layer. It's like a layer at the axis on the edge of the network, which provides what looks like a physical network, but it has all the operational properties of a VM. You can create them dynamically, you can put them anywhere, you can snapshot them, and you can rewind them.
And Nicira is a leader, of course, in the virtual networking space, and VMware has been pioneering the virtual networking space in its own environments. So a marriage like this allows us to provide unified solutions for multiple hypervisors for the virtual networking portion. But of course it's not solving the physical networking problem. You still have to build up physical networks, But it is introducing this new virtual concept.
Art Fewell: It definitely seems now that it makes the most sense to execute with the hypervisor layer. OVS has already been critical. I think David Ward actually pointed this out really well in his presentation at the first ONS, talking about how for years application developers have been circumventing and using different tricks to get around the network. I always think of things like Microsoft Lync's codecs, and their ability to dynamically adjust to network conditions. They're out there guessing, constantly trying to anticipate and guess what the behavior of the network is going to be. That's one of many different examples of how different applications have almost been forced to use a bubble gum and bandaid's patchwork as the network hasn't been providing these services. Not that it's not good technique, but in the ideal world it would be nice if the applications could just say, "Hey, Network, can you tell me what the condition of the network is?" or "Can you reserve resources for me?"
Martin Casado: Yeah, I actually think the real Nirvana, the Shangri-La, is for the applications to be totally oblivious to the network. If I could have my wish, I would be like, applications want to communicate and they'll pop up and they'll start communicating. And if there is available bandwidth, it will be consumed.
Today we kind of have the worst of both worlds. The network is often partitioned or has bottlenecks - and a lot of these are imposed by choke points that are put in place because we have to configure the networks by hand, and we configure them at these choke points. Then we filter traffic through these choke points with the operation we've configured. So because we have the substandard fabric, we have networks that are over-subscribed; we have issues with them; and information isn't available so the applications have to get at it. So I think that we will see the industry moving from worst towards best.
The worst is where we are right now, where an application just has to guess by probing. I think a little bit better than that is to get more information from the network so the application actually has some real visibility. But I think the best is when the applications don't worry about it at all. They would just worry about communicating and not somehow degrading their performance.
I think that the way that we get to this perfect place is if you remove all of the manual configuration state, and all of the policy states in the networks, and you actually build good fabrics. You've seen me write about this and I know you have written about this as well. You look at the problem, you build up your physical network in a way that is redundant, that doesn't have choke points, and then you have many less problems to worry about in the physical network. And you should have pretty much whole cross-sectional bandwidth no matter where the communication goes.
Art Fewell: I think it makes a tremendous amount of sense, and I really enjoyed your paper talking about the separation of the virtual and physical networks. It's very apparent and I think it's going to really help the physical network to evolve where it needs to. Because when it was trying to grapple all application/network challenges, especially through the cookie-cutter approach that the whole traditional industry moves in, it just doesnt seem like the traditional approach will everget where it needs to. Some of the other challenges are that the demands of the private cloud are going to increase network demands by orders of magnitude. We've seen the networking industries' one-application-at-a-time approach to QoS, and its about time our approach to quality of service gets modernized. And it seems now when I look at what's happening with private cloud, we have a cloud controller that wants to move workloads and optimize workloads to create the maximum possible resource utilization. And to deliver that type of cloud elasticity, your controlling platform has to have insight and awareness of what the application's network and performance needs are, and also what each physical host's I/O utilization is -- so the controller can optimize resource utilization.
In the new private cloud we may end up with tier 1 applications on the same physical server as a tier 4 application. And maybe the priority-one application has very low sensitivity to network latency. And maybe the priority-four application is very sensitive - you have all these combinations in these environments, and it really spells to me that we need to have something like a QOS model for every single application. And I really don't see how that would even remotely, operationally, be feasible with the legacy traditional approach to network services.
Martin Casado: This is exactly right. If you look again at the way things are done today, it makes it impossible to build an efficient cloud. If you think about the physical network because of things like VLAN placements, you are limited on where you can place workloads. So even without thinking about the application at all, there are limits on where you can place a VM because of capacity issues or because of VLAN placement issues. And then on top of that, if you put on constraints based on the application - for example, if you got something that is tightly clustered and you want to have low latency requiring physical proximity and/or high bandwidth requirements - you're solving a very difficult constraint satisfaction problem.
You've got all these very difficult constraints when you're doing placement. So one of two things happen ... either you have a very inefficient cluster where you're building out a bunch of physical networks that are grossly under-utilized, or you're not going to be able to sufficiently address the constraints. You're not going to be able to actually get optimality within the application. And the great thing about network virtualization is with an optimal physical network, it's one big fabric. You can place these things where you want and you do distribute to us at the edge to ensure that the application can be optimized for whatever it needs to do. And so that's what we're trying to do, we're trying to move from this very Balkanized view of networking, view of the world, where you do have placement constraints and you do have configuration requirements, to one where you actually can treat the physical network as a pool of capacity.
Art Fewell: Let me ask you, one of the impacts that I see becoming really increasingly apparent -- if you look at hypervisors in the Enterprise what I typically see is VMware came and brought its ability to do agile deployment. Now in the Cisco world it was always the best practice to say, I now have a new application, I'm going to send the networking team in. They're going to go through the manual and figure out what ports that it needs to operate on, what are the IP addresses and hosts that it needs to communicate, and they're going to go write access lists for security and to optimize performance and ultimately best practices that happen on an application by application basis. Now I don't know how many enterprises ever actually really did that for the majority of applications, but it seems like their message today is pretty much the same thing, slightly modernized ... now instead of doing the same old thing for each application, now they do it for templates. Either way, the old 'best practices' do not seem like they could ever empower the future vision of the software-defined data center. It's really going to have to evolve to the state where applications dynamically do an interaction with the orchestration tools that communicate their requirements through API. So there's an API application saying, "Here, Network, here's my security requirements, my performance requirements. Can I reserve these resources?" And really, that - instead of becoming a very static and manual thing, it becomes a really dynamic application-driven interaction.
Martin Casado: Exactly right.
Art Fewell: In my experience, in most cases you have VMware customers that are using the standard vSwitch and not upgrading to full vDs. A lot of them haven't been that eager to upgrade to enterprise-plus licensing in the past. You've had Cisco coming in pushing the traditional Cisco best-practices for network services, but from what I've seen the VM-ware administrators typically keep it as light as possible to avoid having to open network change tickets. And then you've got Cisco coming in and saying, 'Hey, you have to preserve the traditional access layer. You need VN-tag, or you need the Nexus 1000v.' But again from what I've seen, I haven't seen the 'preserve the traditional access-layer' as being very popular with hypervisor admins. But with the introduction of VXLAN, and now the acquisition of Nicira, it seems like I see the hypervisor market really emerging as the new darling of data center networking. Now the industry has long been speculating about operational silos and what roles will emerge in the new data center. Cisco has tried to raise the influence and control of networking teams but it seems clear that the centerpiece is really about the server and application. I see server and hypervisor administrators as the team that is going to largely be taking control of that functionality as a way to get past the slowness that came from having to incorporate a lot of different silos in application development, deployment and maintenance. Do you see the market evolving similar to that, with hypervisor really becoming very distinct from the traditional networking field, and different players, consumers, administrators and so on?