Skip Links

Juniper exec gives inside look at QFabric

By , Network World
February 15, 2012 05:32 PM ET
R.K. Anand

Network World - R.K. Anand, executive vice president and general manager of Juniper Networks' Data Center Business Unit, was employee No. 12 of the network startup back in 1996, leaving a job as a microprocessor designer at Sun Microsystems. Years later he left Juniper for a brief stint at another startup, but came back to help finalize the company's QFabric product and get it out the door. QFabric began shipping in September 2011. Network World Editor in Chief John Dix recently caught up with Anand at the company's headquarters in Sunnyvale, Calif., for a deep dive on the company's answer to high-end data center demands.

Why does the world need QFabric?

If you go back four, four and a half years ago, there were a few mega-trends emerging. Data centers were being consolidated and networks were becoming good enough to enable the push to the cloud. That is, the enterprise could say, my network bandwidth to far away places is sufficient, I have reasonable latency, I have diversity in paths, so I could have my computing elements and storage elements detached.

But it was becoming apparent that data centers would be facing a scale challenge because of the tiered models employed. The tiers have two dimensions -- one was the tier hierarchy of the switching model, with access, aggregation and core switches, and then there are the work tiers -- the Web tier, the app tier and the database tier. Together this represented a networking problem at scale that demanded a true any-to-any solution.

So we looked at the problem and said, OK, how does one address that? And we realized we could not just approach it the way switching has been done for the last 25 years. When one builds standard switching, you hit physics limitations. If I have a half-rack box, there is only so much power you can bring into it, there is only so much cooling you can put through it, and there are only so many square inches of faceplate real estate you can use for connectivity.

That exercise required us to think about taking the single half-rack switch and exploding it -- that is, breaking away from the bonds of the physical metal frame the box sits in. When you break those bonds, you see a half-rack switch is basically a set of line cards that connect up with a fabric. And those fabrics allow any-to-any port connectivity with fixed latencies, and the box works at its scale. So if you could break that metallic bond and make a fabric technology that connected those line cards in a much more scalable fashion, then you will have solved the problem.

OUTLOOK: Data center, cloud fabrics to heat up in 2012

Let's step back, though. There are many ways to build switches, but typically they have line cards in the front and horizontal fabric cards in the back. And usually the fabric cards don't talk to each other, so packets come into an ingress port and then the packet forwarding engine in the line card sprays the packets across the fabric and the packets come out the egress port.

A typical line card in a chassis system, this edge piece, is a rich component. It does a lot of the processing and heavy lifting and buffering and look up, and the core fabric is the simple component. It does very little processing work. Its job is to get and move stuff around. So we took all of the line cards and pulled them out and made them into a top of rack switch, a one U, 48 10G port switch (and eventually we'll see 40G and 100G).

Since the line cards talk to fabrics, and the fabric cards really don't talk to each other, we also took the liberty of putting the fabrics in a different chassis, what we call an Interconnect chassis. You can connect 128 of those top of rack switches, what we call nodes, with four redundant Interconnects, meaning we can support up to 6,144 10G ports.

Now, there is a power element to this too. Typically the front line cards that do all the Ethernet packet processing consume most of the energy. The silicon in the core does very little work, which means the power problem is really at the edge. This is the power problem that keeps going up, year after year. You know, 5,000 watts, 10,000 watts, 15,000 watts, 18,000 watts as you want more 1G ports or 10G or 40G or 100G ports.

By distributing the work to the top of each rack instead of using end of row switches, you distribute the power, which makes for a much more elegant story. A top of rack switch is only 350 watts, well within the realm of a server, right? So that means now I can interconnect a whole 6,144 10G port data center with 40,000 watts.

In the three-tier switching model Ethernet processing is done at every tier, at the access layer, the aggregation layer and the core layer. Every one of those boxes do work, occupy space and consume power.

Isn't there a third component in this architecture?

We need a mechanism to propagate state, so when I add a server all the other nodes know about it. So we built these boxes called Directors. They operate on a clustered system. So Director A and Director B. And they connect through a control plane network that is separate. So when a MAC address of a virtual machine appears, we propagate the state and the VLAN state to all others.

Do the Directors only talk to the top of rack switches?

They also talk to the Interconnects for state and health and other monitoring things. Remember, the Interconnects are not connected to each other. That's the elegance of the story because now, if I lose an Interconnect, I have degradation of bandwidth, but not degradation of connectivity.

Does every top of rack have to be connected to each Interconnect?

Yes. It is a complete mesh. Everybody talks to everybody.

And that's what enables you to say that every port is just one hop away?

It goes back to the point I made about the tiered data centers. Tiered data centers are all about creating the HR pod, the Finance pod, the Sales pod, and then you establish the Web pod, the app pod and the database pod. Now with QFabric, you have suddenly freed yourself from those bonds. Because in this story there is an equality property; any one of these ports is equidistant from any other port. Any port can be configured as a Layer 2 or Layer 3 port that allows it to be a member of anything. So suddenly you are not constrained by an application, a server or a virtual machine, or where your data sits.

When you spin up a virtual machine all you have to do is say, is there any capacity on the server? You don't have to worry that it needs to be an HR pod, or an IT pod or Finance pod. And you say you are a member of this group, this VLAN, so you can access the following storage. So suddenly any port can be connected to any other port, at scale, in less than five microseconds, so that means the majority of the applications you need in clouds and data centers will work well.

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News