Juniper switching boss talks technology challenges, Cisco Nexus 6000

Juniper's Jonathan Davidson says Virtual Chassis QFX ToRs and Microfabric pods interconnected with the new EX9200 will drive simplicity, automation

Jonathan Davidson

Jonathan Davidson

Jonathan Davidson took over Juniper campus and data center switching when the two previously separate business units were combined following the departure of founding engineer R.K. Anand. Davidson has a service provider routing background at Juniper and Cisco, which is no coincidence -- after five years in switching, Juniper has been unable to mirror the success it had in its first five years in service provider routing. But it did start from zero and surpassed at least six other incumbent vendors to attain the No. 3 position in the market. The company has more than 20,000 switching customers cultivated through organic development, Davidson notes. And as Juniper moves forward amid a forklift upgrade facing its EX core switch base and after an initial misfire on the QFabric data center switch, it's focusing on customer demands for simplification, agility and automation. Davidson discussed some recent and future developments in Juniper enterprise switching.

Why did Juniper combine the data center and campus units?

When you're fundamentally trying to change an industry that hadn't changed in over 15 years or longer, you need to make sure you have a high performance team together, you need to make sure they're not distracted. So we created a business unit that was targeted for fundamentally disrupting the data center space, and that was our QFabric solution. But once you actually have that product out into the market, you actually get to a point where you want to find more synergies between these different organizations. We wanted to make sure that we were able to leverage the best of the EX product portfolio as well as the innovation we saw and continue to see in the QFabric portfolio. In bringing them together, we are able to leverage the best from both, and really enable our customers to have more choice.

But aren't the needs of the campus and data center drastically different?

If you look at the fundamental building blocks for technology and how we view things, I'm going to have to switch a Layer 2 packet whether I am in the data center or campus environment. So why have two different stacks of technology that are going to do almost the same thing? You're right in that there are unique requirements to both; that's how you actually package the systems together. Whether traffic runs East/West or North/South depends more on the construct of the system rather than the underlying technology. Many customers use the same core switching platform for both their data center and campus environment. That's why customers have embraced our Virtual Chassis technology. They'll use the same Virtual Chassis in the campus and data center.

So will EX and QFabric eventually share the same ASIC and code base?

Whether you use an EX platform or a QFabric platform, it's running Junos. It's about simplifying the operations for our customers and that can happen across either one of the architectures or platforms or products our customers decide to go with. We fundamentally believe that if you look out five to 10 years from now, we call it the Path to Flat. We truly believe almost every data center is going to be a flat data center. We've translated flat to mean fabric. Any-to-any connectivity in the data center is important. If you truly have a flat network, you can have deterministic latency. In simplifying the Path to Flat, one of the things we're going to start to do is actually bring these two technologies together. So one of the things we're going to start talking to our customers about here pretty shortly -- we haven't gone broadly with this yet -- we're going to take that Virtual Chassis technology that tens of thousands of customers have deployed and put that onto our QFX top-of-rack switches.

What it means is, I can start with a QFX ToR, have QFX at the top-of-rack and aggregation layer, and run that entire thing in a Virtual Chassis-based network. If I decide that I want to go even more flat, I don't need to throw any boxes out, I don't need to re-cable; I simply need to change the software and the configuration and actually add the QFabric Director and then I have a completely flat network and a centralized point of management, and I am able to grow from a few dozen 10G ports up to 6,000 10G ports without having to rip and replace any portions of my network, or re-cable.

If you have our Microfabric [the QFabric 3000-M Interconnect], you are able to go from zero to 768 10G ports, the QFX can act as the interconnect as well, and you can grow with that. We think that the interconnect is something that's a critical component and the [Broadcom] silicon family that we're using today will be able to continue into the future. We will use the most advantageous silicon for our customers. What's important to them is simplicity. But at the end of the day, 98% of our customers don't care what silicon is in the platform. They want to make sure that we're meeting their requirements or making sure that it's simple for them to use, and that they get the right price and performance.

What about Virtual Chassis for the QFabric Interconnect?

We'll be talking more about that at the end of the year.

[ THE BIG PICTURE: Juniper CEO Johnson talks software, the company's recent challenges and key future directions ]

What's selling more or in greater demand: the QFabric 3000-G Interconnect or the 3000-M?

One of the things that we have found is Juniper always tackles the hardest problems first. And I think it always doesn't get the credit for doing that. Solving the hardest problems isn't necessarily solving the sexy problems. When we go out and try to fundamentally change the way data center networks have been built for the past two decades, we came out with our QFabric single tier solution. And we decided to come out with a solution that scaled to over 6,000 10G ports in a single fabric. We could have easily come out with the smaller fabric first. But when you start to look at the logical scale issues, the issues that have to do with keeping 128 nodes all in sync at the same time ... if you solved for the small problem first you would have run into scaling incrementalism over time, and it would have taken us a much, much longer time to get to the scale that's necessary. That's one of the fundamental reasons we haven't seen any other vendor in our space come out with anything that looks remotely like this. The problem that we solved was a hard one.

Multichassis is pretty hard to do. Think of QFabric as a 128-node multichassis system that acts as a common, single fabric. That's the scale of the problem that we solved, and when you look at what QFabric actually did, all of the components and what it looks like, I'll call SDN Version 1. You have an external director controlling the various nodes; you have an interconnect that it can control as well; and you can provision everything through a single point of management, with an out-of-band control plane. When we started building this there was no term called SDN. We solved the problem internally with all open, standards-based protocols. We use BGP to communicate inside of the fabric. SDN Version 2 from Juniper is going to be a combination of SDN Version 1 plus some of the things Bob Muglia mentioned around 6-4-1 and obviously the Contrail controller is going to be a big portion of how all of this fits together into what I call SDN Version 2.

What about OpenDaylight?

We also realize it's an ecosystem of players and customers are going to want to have choice. It's important that we work closely with industry leaders like VMware, not only on their hypervisors and virtualization technologies, but also where they're going. It's important that we work with Daylight. And it's also critically important that we work with other third parties to actually make sure that we have the right ecosystem partners around that. We are a big believer in that the data center space is an ecosystem play. And if we try and go it by ourselves, we will not be as successful as we would be if we partner very tightly. The market has clearly told us that OpenStack is important; that VMware is important; that Daylight is important; and I think there will be a few other players that come out and tell us that that's important.

Which way do you point an SDN customer when you have SDN Version 1, Version 2 and OpenDaylight?

From a customer perspective there's not a whole lot of confusion. If I am a customer that is a VMware shop, more than likely I'm going to want to stick with the VMware path. As a networking vendor, all of our components must seamlessly integrate into that environment. Because I want to make sure that my applications are resilient, I want to make sure they're secure, I want to make sure that my applications can communicate with each other. So I don't think there's any confusion from that customer perspective. Infrastructure as a service is a different model. Many of them are VMware customers but a lot of them are looking to go down the OpenStack path. Through OpenStack, they can go down the Juniper/Contrail path, or OpenDaylight ... that's going to be a customer-by-customer decision. The key thing for us is to make sure that they understand what their options are and what they have available to them.

Why would a customer opt for Daylight over Contrail, and vice versa?

Once we know what Daylight is from a product perspective, I'll be able to answer that question. But it's still early and I think the targeted customer for Daylight is still new. Once the product exists there'll be inherent benefits to each one.

What's the SDN strategy for the rest of the EX portfolio?

We can put any of the EX9200 control plane protocols across our entire portfolio. You can change the data plane protocol as well (on the EX9200) because of how programmable the chip itself is. We will support OpenFlow on that product as well as other protocols I'm not ready to discuss today. So from a control plane perspective, we're set. From a data plane perspective some of (EX switches) require simply new chipsets in order for us to go and do that. But over time you're going to see more and more consistency across the switching portfolio as we continue to leverage the best of both worlds, both the EX9200 with Virtual Chassis on the EX side, as well as across the QFabric portfolio. A good example of that is Puppet, which is an automation tool predominantly used from a server admin, sys admin perspective. But the biggest pain point many of our network operations people feel in the data center is because the server people have virtualization, sys admins are able to fire up a new compute in seconds or minutes, a new application is seconds ... but then they have to go and file a network trouble ticket to get the network VLAN created. We've been able to put Puppet not only on QFabric, QFX, EX the entire portfolio -- that's a powerful thing.

In introducing data plane programmability on the EX line, does that imply sharing the same silicon stream as the EX9200?

That's something we're not prepared to talk about today. But you can rest assured that we are going to make sure that our fundamental goals of simplicity and automation are things we are going to continue to focus on for our key customers.

How's the reaction been among your EX8200 base to the EX9200?

It's actually been quite positive. They love the Virtual Chassis aspects of things. They love the fact that they are able to have one common core from the campus to the data center. Manageability ... doesn't change in any way, shape or form for them with the EX9200.

What's the migration or trade-in program you have for those customers?

We are going through and reaching out to our EX8200 customers and making sure they understand where we are going, make sure they understand what the platform is and what they're going to get out of the EX8200 over the next five to 10 years. And with that comes the conversation of, is there even a need to make a transition. Most of the time there isn't a need to make a transition. Most of them are quite happy with what they have today. But for those customers who would like to transfer to something that is newer and more programmable, we'll certainly make sure that the transition is a seamless one for them.

Do you expect to retain all of your EX8200 base as they make that transition, or do you expect to lose some of your customers to your competitors?

For those customers who do want to make any transition we certainly expect to make sure that they stick with the EX portfolio. We think that it offers significant benefits for them; they think that it offers significant benefits for them. And the growth that we've experienced in that market we expect to continue as well with the 8200 and the 9200.

Are you offering an even-up trade-in program for the 9200?

None of the incentives we're sharing publicly. We are sharing them with our customer base -- none that we're sharing outside of that customer base.

Does the EX9200 exclude QFabric from any opportunities?

If we look at how customers have evolved over time, in talking through where we were with the solving the biggest problems first, we came out with a 128-node system, and then last summer we launched the 16-node Microfabric. What we have found is that customers' evolution in thinking about what they call failure domains has evolved over the past five years. If you went to many customers five years ago they would say, just give me a bigger and bigger and bigger switch. Many customers are still comfortable with the 6,000 10G ports in a single domain. But there are certain customers who want a smaller failure domain. And so they will go and purchase multiple versions of a Microfabric for a single data center and then they will go and connect those Microfabrics together with the next layer of switching. Before the 9200 was available, one customer had multiple Microfabrics connected together through a [Juniper] MX [router]. They decided to collapse the core and data center edge together into one environment. Now we expect the 9200 to sit at that layer and offer that interconnect between multiple Microfabric pods. It wouldn't be an Interconnect per se but it would be a switching layer between the two pods. You could connect the Microfabrics together as well, if you wanted to. But we think most customers would likely have a second layer of switching on top.

Why wouldn't a 3000-G play that role?

It comes back to failure domains. Some customers just simply want to have pod sizes up to 768 10G ports. That's about their comfort level with a single failure domain. In the traditional two-tier architecture, that would be the aggregation box [supporting] up to 16 40G links going down. With Microfabric, it's all one level. But still, their comfort level is around how many 10G ports. So it comes down to how many applications can I risk losing connectivity to in a given point in time? It just comes down to their belief structure. Not any technical reasons why, it just comes down to their belief structure.

You can use the G fabric as the interconnect and go all the way up to 6,000 ports of 10G. I can go from port 1 to port 5,560 with the same latency that I can go from port 1 to port 3. That's something that is really compelling for them because if I have multiple Microfabrics and I go through that second level of switching hierarchy, my latency's going to change. If I'm the network operations team I can't guarantee the latency between all applications inside of my data center. That's really what the customers ask themselves in determining whether they want the Microfabric or the G.

We solved the biggest problem first and since we launched the Microfabric we've seen significant traction in that particular space. The Microfabric actually fits the majority of sizes of most customers' complete data centers. The majority of data centers today are less than 1,500 gig ports. You might imagine then, do I need to buy a 6,000 port thing that I know I'll never scale to? Or am I OK with one or two Microfabrics?

So at first release, QFabric was a solution looking for a problem.

No, it was a solution for the largest of customers who really wanted to have any-to-any connectivity between a very large number of ports. The traction in G continues to do very, very well.

How's demand for single-tier?

I would say that demand for a single-tier solution and a fabric-based solution ... Customer's don't think from a single-tier perspective, they think from an attributes perspective. What are the attributes I care about? I care about simplicity. Can you give me investment protection? I might want to go to a virtualized infrastructure in a year or two. I may want to go to an overlay infrastructure in a year or two, or three, or five. We want to make sure our fabric technologies give the customers the ability to be the best underlay for the overlay, and the best underlay for a virtualized environment. We have to make sure our customers are able to have the greatest experience from an attribute perspective. So it's all selling. It depends upon which attributes the customer cares about more. As we have a simplified approach to our architectures and our building blocks -- Virtual Chassis on the QFX 3500 and 3600 -- you're going to be able to have a clear and consistent path to more flat as time goes on.

On the Path to Flat, is single-tier ever applicable in the campus?

What we hear from our customers around campus is specifically around similar types of issues. They're not saying "I have 1000% growth in East/West traffic every three months." That's not the problem. But they do care about simplicity. And they do care about automation. When you start to see some of the similar things that you're hearing, I do think that some of them will start to move over. Hence, the EX9200's applicability in the campus as well. So being able to take applications and services and run them on a common core platform, and is you think about an access point. Enterprise already has a wireless LAN SDN-type of solution. So what we want to do over time is actually bring those two elements together, which we talked about in our launch a few weeks ago. We see that as the first step toward making the campus environment a simpler place to do networking and network automation.

Where does your "Simply Connected" EX portfolio fit into all of this?

All EX platforms run Junos. Wherever we can go out and put OpenFlow on all of these platforms, we absolutely will. The reason I give it a qualifier of "wherever we can" is simply because we want to make sure we have right restrictions of our messaging to our customers appropriately. That said, we have publicly come out and stated which platforms will have OpenFlow by the end of this year, we've had OpenFlow out in demo version for well over a year. We have OpenFlow in a production network on our MXes that's running 100G through the MXes. That same OpenFlow code is going to be because it's Junos. It will run across the EX portfolio as well as the QFX portfolio at the same time. The team is hard at work at making that happen and it's simply a matter of time, not of will.

Why not converge the programmability and logical scale of the EX9200 with the low latency, single-tier characteristics of QFabric in one platform?

All of [the programmability] of Junos Virtual Control is applicable to both. Over time, you shouldn't be surprised if you start to see a simplification of how things are going to go, a simplification of building blocks, a simplification of architectures, and a simplification of where we're heading. So, simplification is key.

The EX9200 is targeted at Cisco's Nexus 7000 "M," QFabric at the Nexus 7000 "F" -- what's targeted at the Nexus 6000?

We believe that it's focused primarily on a very specific market in the financial sector. They predominantly care about latency. When you look at customers who care more about simplicity, automation -- what can I see inside of the network? -- then you have to make other trade-offs inside of the silicon. I can do on-chip memory or put those tables outside the chip. My tables can be much, much bigger -- offer logical scale, number of VLANs, number of routes, number of other things. But it means my chip's going to be a little bit smaller because I have to go off-chip, get what I need and then come back onto the chip. In order to go down that low latency path at the aggregation layer, you basically said, "I am not going to care about large logical scale." There are trade-offs that have to be made from a visibility and reporting perspective because you're not going off-chip and everything is on-chip. So knowing what I know from their data sheets, and knowing what I know from what they're doing from a latency perspective, it's all on-chip, which means they've had to make some pretty tough choices around how much logical scale that box is going to be able to do. So for customers who are in very large virtualized environments, they are going to run out of logical scale. And I'm not saying that that's the case with that platform; but I'm telling you the trade-offs you have to make from a silicon perspective. We fundamentally believe that, in the kinds of environments that the majority of data centers have today, they want large amounts of logical scale because of how VLANs are deployed today; and because of the tight packing of virtual machines on servers. So the fundamental belief is that aggregation box, the 6000, will be targeted to customers who care only about latency. There are other trade-offs they had to make to go into that market.

Is there any concern that your MX router customers will demand EX9200 prices since that switch is based on the MX?

These are two fundamentally different products. They certainly have some common components -- power supplies, fans, some of the other technology is similar from a DNA perspective -- but if you were to go and fire up an MX and look at the features and functions that are one of them versus the other, they are vastly different. So it's not a one-to-one replacement of products. The MX does not have a lot of the Layer 2 features that are on the EX9200. If you fire up an EX9200, it's a switch. There are a number of features that are on the EX9200 that are just not on [the MX] because it is not a switch.

Will QFabric eventually be based on custom silicon, either new or re-purposed?

We are continuing to invest in both hardware and software for QFabric. And I'm looking forward to talking to you again later this year about all of the things we have coming on that platform.

From CSO: 7 security mistakes people make with their mobile device
Join the discussion
Be the first to comment on this article. Our Commenting Policies