Canadian airline company WestJet is one of the earliest customers of VMware’s NSX network virtualization tools, which it initially reached for to address a security issue. Network World Editor in Chief John Dix recently sat down with WestJet technologist Richard Sillito to find out what the company is learning about network virtualization.
Let’s start with a thumbnail description of your environment.
We have two geographically dispersed data centers, a main data center with about 2,000 servers, 80% of them virtualized, and a second center with around 500 servers for disaster recovery. We also have a third collocated data center we’re shutting down.
And what pushed you toward SDN?
Our environment was initially designed for north-south traffic. You come in and hit the DMZ, you maybe hit an internal server, and maybe a security internal server, and then you’re back out again. So that path is very simple. But once we started integrating other systems we introduced a lot more east-west traffic.
For example, we have separate Internet connections for eCom and for corporate use, with the idea that the the two never meet. Then we brought in identity management and said identity would be used to authenticate and provide services to everyone in WestJet’s world, both employees and guests.
So people will log in to Westjet.com and the guest portal will show up and they will have certain services available, but if you log in as an employee you’ll get all those services plus extra corporate services. So all of a sudden the segmentation of those two services doesn’t make any sense. You’re coming into eCom and going over to the corporate DMZ to connect to that service, and then going inside for corporate services. And it’s that kind of multiple pathing that started putting huge stress on our network.
What’s more, we’ve added many other services. Our rewards program was originally its own website, but then we integrated that into the main website, and vacations is a separate site and we’re going to integrate that too, because guests don’t want to log into one site for this and another site for that. They want to log in and be able to access all their services. So as we integrate these into one portal, we’re increasing that east-west traffic even more.
So you recognized you had problems brewing, how did you get to SDN or network virtualization as a solution?
This east-west traffic problem proliferated to the firewalls. As you increase the amount of connectivity, your firewall rules increase almost exponentially. So we saw this rapid growth in firewall rules. But it also got worse as the services got larger because, when you stand up a new server, you create all the firewall rules for that server. So that means as you scale vertically you take on workload.
So it was really a security problem. How we were handling security was forcing the network to do things that are, as my boss would say, unnatural. That’s when I saw that it was a segmentation problem. The way we were segmenting didn’t make sense.
I looked at the different segmentation models, starting with the idea of segmenting based on data classification. But with that model the amount of firewall rules you need -- because these sensitive data systems still need to talk to the non-sensitive data systems -- still results in a huge rule set. You still have a lot to maintain.
Then I looked at it from a compliance perspective, and the same problem occurs. Our PCI systems still need to talk to a lot of other systems in our data center. So that didn’t solve the segmentation problem.
And then I looked at the network, and said, “Well, the network is all about services. That’s what it does. Services are usually associated with the port, and if you have a well-organized data center, they’re usually associated with a place on the network.” So it became very simple, once I started looking at it from a service perspective.
So I developed this model and we took it to the technology council and we got approval to start looking at technology. The first idea was to do Layer 2 transparent firewalls paired up with our core router. I call that the big iron solution because the router operates in a bridged Layer 2 mode and you just insert it in the middle of a traffic stream. So basically we’d have the core router and just hairpin traffic off the core router through the firewall. So there’s no Layer 3 interface on it, per se.
And you didn’t like that approach why?
It was expensive and it didn’t optimize east-west traffic. What it did was took that east-west traffic problem and put it in one place in the network where we could then throw enough resources at it to deal with it. But we were looking at a number of options from a range of companies when VMware came out with this new thing called NSX. We had them come down and present and immediately it started to gel for us.
The big thing that started driving us towards the approach is the concept of bringing the physical devices into the virtual world; the fact that we could create a network segment that consisted of both physical and virtual devices. You can do that with a VLAN but it’s not pretty. The solution here was much more elegant.
The second thing was the concept of being able to optimize east-west traffic by keeping the traffic on the host, and that was something that you couldn’t do in a physical world. Our plan involved all these different service bubbles, and each bubble would have to send traffic up to the core in order to talk to another bubble. So, like I said, if we centralized that problem we could throw more hardware at it, but what would be even better is if those workloads were on the same host so we could have that traffic inspected on the host without sending it up to the core and back.
You do that with a virtual firewall on the host?
That’s correct. They call it the logically distributed firewall.
Do you use VMware’s virtual firewall for that?
Yes. Right now we’re focusing on Layer 1-4 for their firewalling. Your higher level inspection is generally your north-south inspection, so we still have that hardware firewall at the edge, and we have other security devices that do the inspection before it reaches the data center. But we know we’d like to add some Layer 5-7 on the east-west traffic as well, so we’re looking at vendors for that. And that’s where this interesting philosophy called service chaining comes in.
With service chaining I can insert devices into traffic flows, but without the typical network limitations. Routing is not too specific, so today I’ve got to take all the traffic from this workload and route it to the security device and then filter it and send it. But why would I want to send my backup traffic to something like a web application firewall? So we tend to overload security devices because we had to have all the traffic go through them.
But with service chaining, the controller will be smart enough to say, “Oh, if that’s Port 80 or 443, then shunt that traffic over through the web application firewall, but Port 7777 backup traffic, there’s no sense sending you through there.” So we can be much more selective about the traffic. And the whole idea is to decrease the amount of capacity we need to buy for those devices.
Coming back to the idea of bringing physical devices into the virtual world, can you expand on that.
We have some XML firewalls and some load balancers and credit card tokenization boxes, so from my app perspective, if I need to tokenize a credit card number I send it over there and get it back, and the same goes for other services.
Now if I create virtual interfaces on the hardware boxes that provide those services, then I can map those virtual interfaces to the overlay networks and make each box appear in several overlays. So now it’s very simple for the service owners. They think they’re talking to their token broker. “It’s the same IP address space, so it must be my token broker.” We’ve just obfuscated that and given them a presence inside their bubble.
Will you use OpenFlow to tie in some of this hardware?
The goal of the hardware vendors is to participate in the SDN network. And they have to because the whole thing is strung together with tunnels, if you look at it. Even with OpenFlow, it’s still strung together with tunnels. So there has to be some way for the packet to know that it’s got to go over to that switch and then deliver it to that switch with the VNI and have the VNI convert it to a VXLAN. Where does it get that information? The best place to get it from is the controller.
But the big question is the physical versus the virtual space. It’s almost like two different SDN camps, and how is that world going to come together? My personal feeling is that we’re going to see, and we’re already starting to see, SDN controllers sharing state.
But you haven’t really started to integrate your network hardware components into this network virtualization world?
We have, and there are challenges. We have ways of overcoming the challenges, so we will have physical in the virtual world, but we won’t have it the way we ultimately want it. But the vendors are growing into that space.
So you decided to go with NSX. Where do you stand in that effort?
We’re in design phase and we’ll be implementing it very shortly. We’ve done a lot of work in the lab. We’re actually lucky enough to be the first beta customer. So we’ve had NSX in our lab for over a year now.
Are you a purely VMware shop? No other hypervisors to contend with?
No. And the real challenge if you have other hypervisors, is you don’t get that optimized east-west traffic. Because it’s really the logically distributed router and the logically distributed firewall that will allow you to keep traffic on the host. So it’s that old thing, if you want all the functionality you end up getting locked in.
How long will this take to roll out? What is the process for that?
How long is a piece of string? The idea is to start with just our website and our hope is to have that up by December. Certainly we want to see that up and operational by the end of the year. That’s roughly about 200 servers in size.
+ ALSO ON NETWORK WORLD Are enterprises ready for network virtualization? +
Will the shift to network virtualization require you to change anything organization wise?
It’s truly interesting to see that evolution. There’s just so many facets. Some people are wondering, “What does my job look like after virtualization?” And then there’s the whole idea of, “Do we create a separate team that is the cloud team? Or is this more of a community-based approach?” We actually have architects looking at what is the operational model that is best for WestJet.
How I rationalized it in the early stages -- because I needed to get people to start working together on this -- I said, let’s just create a cloud team; pull resources out of IT and bring them together. Well, one challenge is we’re not that big. We’re only about 230 IT people. You start pulling one or two people out of certain groups and you’ve reduced their capacity by one-fifth.
And if you were to pull those guys into a team, then they become isolated and segmented from their world. So now you’re decoupling virtual networking from physical networking. You’re decoupling virtual security from physical security. You’re decoupling all these, and is that a healthy thing to do? Sure, it makes running the cloud easier, but is it creating continuity in your IT? I would argue no.
So ultimately I thought, building a software-defined data center is no different from building a data center, and when we built our last data center we didn’t say, “OK, we’re pulling people out of IT and putting them in the Eastern Data Center Team.” We had a representative from each group involved and they will be the bridge to bring knowledge in and take knowledge back out.
So I fundamentally looked at it the same way, and said, “Let’s just pretend we’re building a data center.” So I pulled together network guys, server ops guy, security, a bunch of architects, and we started a group called the V Team. We meet once a week and work out designs, tackle problems, listen to presentations, whatever is on the agenda that week.
How does NSX exist in the lab today? Do you model out every little aspect?
It’s still relatively contained, but that is one of the beauties of software-defined networking. You can make it very complex, but it doesn’t affect the physical layer. So as long as you’ve got a couple of hosts that are sending traffic back and forth, you’re validating everything you need to validate. It’s neat because it does scale so well. That’s the beauty of it. Now all things fail at scale, so it will fail at some point, right? The beauty for us is, with a relatively small data center compared to the other guys who are running NSX, we go to the techno advisory boards and hear the kind of numbers they’re pushing and we’re like, “Yeah, we’re probably not going to have to worry about that.”
Anything else that we didn’t hit that you think is important to get across about this journey?
If I was really to boil it down, it’s really that we found a way to put a network security policy in that isn’t dependent on how networking works. I remember we were having a white boarding session for the web stack and someone said we had to think about routing traffic to the firewall. And I said, “No we don’t. You just let the traffic flow. I’ll set the policy, and the policy will be applied as it comes and goes from the virtual machine. I don’t care how you route it. It doesn’t matter to me anymore, because I can just set the policy around that virtual machine.” And that ability, I think, is just going to be huge.