Giant switch wins high marks for uptime, resiliency; throughput hindered by current line cards
Building a big data center and looking for a switch to match? How do 256 10G Ethernet ports and nearly 1.7 terabits of capacity sound?
Building a big data center and looking for a switch to match? How do 256 10 Gigabit Ethernet ports and nearly 1.7 terabits of capacity sound?
How we tested Cisco's switch
That's what Cisco is offering with its brand new Nexus 7000 Series data center switches. Intending these boxes to be a data-center mainstay for the next decade, Cisco has constructed the Nexus switches to be far larger than its current high-end offerings.
Indeed, this exclusive Network World Clear Choice Test was the biggest we've ever conducted. Cisco's engineers told us they too had never before tested at this scale. Besides performance, we also assessed the Nexus in terms of features, usability and high availability and resiliency (see "How we did it").
Performance turned out to be only fair, in part because current line cards tap just a fraction of the switch's 1.691Tbps capacity. Resiliency, useful features and a modular design are what really make the Nexus switch an interesting contender in data-center switching.
The layered look
While modularity has long been a part of chassis-based switches, the Nexus extends this approach with a layered, redundant approach in both hardware and software. The switch uses a mid-plane design with up to five 230Gbps fabric cards and, in the Nexus 7010 version we tested, up to eight line cards and two management cards. A larger 7018 chassis, due to ship by year's end, will support up to 16 line cards and up to 512 10G Ethernet ports. Significantly targeted for data-center use, Nexus switches also support Fibre Channel over Ethernet (FCoE) cards, but we did not test these.
The management cards are beefier than those on current high-end Catalyst 6500s, featuring dual-core Xeon processors and 4GB of memory. A new operating system, dubbed NX OS, takes advantage of the extra horsepower, as do the system's larger routing tables and virtualization features.
On the software side, NX OS's modular design differs from Cisco's venerable and monolithic IOS. With the Linux-based NX OS, each layer-2 and layer-3 protocol runs as a separate process. If there's a problem with one process, it won't affect other parts of the system – something our test results demonstrated. The switch still supports the familiar IOS command-line interface (CLI), but it too is just another process.
In many ways, the Nexus CLI is a better IOS than IOS. Longtime Cisco users will appreciate that NX OS finally supports IPv4 addressing using classless inter-domain routing (CIDR) notation, saving many keystrokes. NX OS also allows inline configuration editing with the Unix sed (stream editor) command. The sed command enables search-and-replace editing of a configuration file from the command line, a great timesaver.
Another useful improvement is the inclusion of a packet capture and decode facility. The CLI has commands to read traffic headed to and from the management cards, a helpful tool in troubleshooting. There's a tcpdump-like decoder available from the command line, or, additionally, users can save captures for decoding by Wireshark.
NX OS also supports virtualization through the use of virtual device contexts (VDC), allowing up to four complete virtual switches to be defined on a single platform. As with process separation, the VDCs operate independently of one another. (See How to set up VDCs in online blog.)
All about uptime
All this modularity should result in greater uptime and resiliency – something most network managers prize even above high performance. Accordingly, in our tests we gave the greatest weight to assessments of high availability and resiliency.
We reviewed high availability with two tests of software and another involving hardware. The first software test focused on the Nexus switch's process restart capability. We configured the Spirent TestCenter traffic generator/analyzer to bring up Open Shortest Path First (OSPF) adjacencies on all 256 Nexus 10G Ethernet ports, advertise routes to more than 50,000 networks and offer traffic to all networks.
While traffic was flowing, we deliberately killed the Nexus' OSPF process and then watched as the switch automatically restarted the process. Not a single packet was lost, and no change was visible to the hundreds of other OSPF routers emulated by Spirent TestCenter.
This is a different mechanism than OSPF graceful restart, where routes must be recalculated. Process restart occurs much faster (typically in less than a second) so that no change in routing topology is visible to other routers.
Our second set of software resiliency tests involved upgrading and then downgrading system software while continuously forwarding traffic, a key capability in situations where no downtime is acceptable. In both upgrade and downgrade tests, we changed the software image on the first management card, watched as it handed over responsibilities to a second management card and then upgraded all line cards. A complete upgrade took nearly 45 minutes, during which the Nexus maintained all routing table entries and forwarded all traffic with no packet loss.
It's just as important to support seamless downgrades as upgrades. Indeed, prior experience with many vendors' routers and switches suggests the downgrade path is a lot bumpier than the upgrade one. That was not a concern with the Nexus switch; as in the previous tests, we saw no changes in routing and no packet loss during a downgrade.
Cisco claims Nexus offers N+1 redundancy with as few as two fabric cards in place for gigabit line cards or as few as three cards in place for 10G Ethernet cards. To validate those claims, our final resiliency test involved pulling four out of Nexus' five fabric cards one by one while continuing to offer traffic to all 256 10G Ethernet ports.
Fabric utilization rose as we removed the cards, but there was no packet loss with just two out of five fabric cards left. With only one fabric card in place, the system dropped about 47% of traffic but that's because our traffic load oversubscribed the fabric. These results validate Cisco's redundancy claims; in addition, the single-fabric result became very significant in our performance tests.
Throughput and delay
Beyond slick features and high availability, performance – moving packets to their destinations as fast as possible – is often the main event when it comes to routing and switching. While it's tempting to think 256 10G Ethernet ports will offer virtually unlimited capacity, our results suggest that, at least with the line cards we tested, Cisco still has work to do when it comes to removing bandwidth bottlenecks.
We measured Nexus' performance with separate tests of throughput and delay for layer-2 unicast, layer-3 unicast, and layer-3 multicast traffic. As usual with such tests, we configured Spirent TestCenter to offer traffic in a fully meshed pattern among all 256 ports to find the throughput level.
Throughput tests already are stressful by definition, but we added to the burden with extra monitoring and management functions in all tests. We set 500-line QoS and 7,000-line security access control lists on each line card and also enabled NetFlow on up to 512,000 flows, the maximum Nexus supports.
Tests of layer-2 and layer-3 IPv4 unicast traffic produced virtually identical results, with the switch achieving throughput of up to 476 million frames per second (fps) across all 256 10G Ethernet interfaces.
With multicast traffic (50 sources sending traffic to each of 200 groups, resulting in 10,000 multicast routes), throughput was slightly lower, topping out at 353 million fps. Expressed in terms of bandwidth usage, the Nexus switch moved up to 79.52Gbps across each of eight line cards in all tests (L2, L3 and multicast), for a total of around 636.16Gbps.
These numbers are far below theoretical line rate, and also nowhere near the almost 1.7Tbps capacity mentioned earlier. The bottleneck is in the current-generation line cards, which top out at just less than 60 million lookups per second. Cisco says higher-capacity cards, slated for release in mid-2009, will be able to use the full fabric capacity.
Given that the fabric capacity vastly exceeds that of the current line cards, the throughput results are a bit like what you'd get from putting the wheels from a Toyota Prius onto a Mack truck: It's no longer efficient, and it won't carry anywhere near as much as it could.
To get a more complete picture of what the switch will be able to do when outfitted with faster line cards, we did some calculations to determine effective fabric capacity. In resiliency tests with a single fabric card, the switch forwarded traffic at around 338Gbps. Assuming results scale linearly as fabric cards are added, that means Nexus will offer up to 1.691Tbps of capacity – once faster line cards are available to take advantage of it.
We also measured delay – the amount of time the switch held onto each frame. We took these measurements at 10% of line rate.
With the exception of jumbo frames, both average and maximum delays for all frame sizes were less than 50 microsec. That kind of delay is unlikely to affect even delay-sensitive voice, video or storage applications. Jumbo frames took longer to process, with delays between 74 microsec (for L3 unicast) and 113 microsec (for L3 multicast). Bulk data-transfer applications usually aren't very sensitive to delay, so the elevated delays with jumbo frames may also be a nonissue.
It's all too easy to dismiss the performance results from these tests as subpar, but that's oversimplifying a bit. The Nexus 7000 Series is a much faster switch than our throughput numbers suggest, but higher performance will have to wait until new line cards ship sometime next year. In the meantime, the new switch's modular design and high-availability and virtualization features make it very much worth considering for large data-center deployment.
Newman is president of Network Test, an independent test lab in Westlake Village, Calif. He can be reached at email@example.com.
Network World gratefully acknowledges the support of Spirent Communications, which made this project possible. Spirent's engineers reviewed the test methodology and configurations for its Spirent TestCenter traffic generator/analyzer. These included Travis Andrews, Mark Hall, Brooks Hickman, Joshua Jansen, Steven Leventhal and Himesh Mehta.
Newman is also a member of the Network World Lab Alliance, a cooperative of the premier reviewers in the network industry each bringing to bear years of practical experience on every review. For more Lab Alliance information, including what it takes to become a member, go to www.networkworld.com/alliance.
Cloud computing prompts IT organizations to rethink how they acquire talent and develop skills.
Sponsored by AT&T
Microsoft introduces on-premises system designed to sync up with its Azure public cloud computing
Cloud providers, carriers and fast Wi-Fi users are all looking for fatter pipes
Sponsored by Brocade
Sponsored by AT&T
A press alert from EMC that it will be announcing a "new business development" on Wednesday has
How techies can bring data mishandling and abuses to light without putting their careers in jeopardy.
A brief history of Ubuntu, as alliterative as all-get-out.
Prototypes and simulations based on virtual reality can save companies millions.