Multicast group capacity: Extreme comes out on top
Multicast join/leave delay: Arista and Dell are swell
As data center managers consolidate and virtualize their servers, the next order of business becomes moving all that traffic. Enter top-of-rack data center switches that offer speed, scalability, redundancy, virtualization support and other features not available in garden-variety Ethernet switches.
Because many data center applications today make use of IP multicast, scalability is naturally a concern. For layer-2 switched environments, the main measure of IP multicast scalability is group capacity, or the number of Internet group membership protocol (IGMP) snooping entries a switch can keep track of.
Switches snoop or listen for IGMP group membership reports and then should forward traffic only to those ports where hosts have subscribed to specific groups.
Extreme's Summit x650 was the clear leader in multicast group capacity, successfully forwarding traffic to 6,000 groups. The Arista and HP switches were next, each forwarding traffic to 2,047 groups. Cisco's Nexus was a little behind them with 2,000 groups, while the Blade and Dell switches each supported 1,024 groups.
Multicast join/leave delay: Arista and Dell are swell
There can be real money tied to the speed at which a switch processes requests to join or leave multicast groups. Financial services companies depend on getting quotes quickly; search engine clusters need to keep synchronized for their companies to realize ad revenue; and ISPs with IPTV offerings use multicast to add and remove subscribers. For all these applications and many more, the speed of multicast group join and leave processing is a key consideration.
As with the multicast throughput and latency tests, we used 989 groups and had hosts on all receive ports join all groups. This time, however, we set aside one port as a monitor port that should neither transmit nor receive multicast traffic. This extra port was a check against flooding.
Dell's PowerConnect 8024F had the lowest average join and leave times, followed closely by the Arista and Blade switches. However, Arista's 7124S was more consistent across the board, with the least variation between average and maximum join and leave times. This is largely a function of control-plane processing power, and reflects Arista's use of a dual-core 1.8-GHz x86 CPU, a powerful processor for a top-of-rack switch.
The HP and Cisco switches had much higher join and leave delays than other devices. In the worst case, Cisco's switch took nearly 23 seconds to process a leave message for one group and averaged about 10 seconds to process each leave message. The Nexus join times were much lower, in the low hundreds of milliseconds.
Even if one assumes that leave delays are less important than join delays, there's also another issue with the Nexus switch: It leaks multicast traffic when its control plane is busy. We noticed that whenever the switch was processing responses to IGMP queries, it flooded multicast traffic to that last port – the one with no multicast subscribers, the one that never should have received multicast traffic.
Cisco says this is working-as-designed behavior. It appears that when the switch's CPU is very busy (as it is when processing responses to IGMP queries), the switch will flood multicast traffic to all ports, even those with no subscribers attached. This may be intentional, but at the same time we saw no flooding when running the same test on two Cisco Catalyst switches.
We also wondered if the multicast leakage might be just an artifact of our stress testing, but that turned out not to be the case. We still observed flooding even when we cut down the number of multicast receive ports from 22 to just one and reduced the traffic rate from line rate to 10% of line rate.
After testing concluded, Cisco reproduced the leakage issue and says it's working on a fix.
Forward pressure: No cheaters allowed
The final set of tests looked at an area where lots of switch makers used to cheat: Forward pressure, or the practice of transmitting frames spaced too closely together. This was a big problem in the early days of Ethernet switching, when makers of half-duplex switches used to try to win collision battles by putting a frame on the wire sooner than the attached station could.
While we're long past the days of half-duplex switching, it's still important for switches not to run too fast or too slow. A too-fast switch can cause an attached host, switch or router to drop traffic. A switch that transmits too slowly may itself drop frames over time, given a buildup of traffic from a faster link partner.
The IEEE 802.3 Ethernet specification allows for some leeway in clock rates. It says every Ethernet interface must tolerate variations in clock speeds of 100 parts per million (ppm). In traffic terms, that translates to about 1,488 fps when 64-byte frames are involved. Expressed another way, 10G Ethernet devices should run at 14,880,952 fps with 64-byte frames, but it's OK if the rates are up to 1,488 fps faster or slower than that rate.
To find each switch's maximum transmission rate, we configured Spirent TestCenter to cheat. As a test instrument, it can deliberately transmit traffic with illegally small gaps between frames. Nominally, Ethernet is supposed to operate with a 12-byte gap between frames; for this test, we offered traffic with 11- and then 10-byte gaps to determine maximum forwarding rates.
All switches passed this test, meaning none forwarded traffic above the 100-ppm limit. At the same time, we observed a wide range forwarding rates within the 100-ppm tolerance. As noted in the throughput discussion, the Cisco Nexus 5010 forwarded traffic a bit slower, around 17 ppm, below line rate. The Dell switch was very close to nominal line rate, at around 2 ppm over line rate. Next fastest was the Blade switch at around 23 ppm above line rate. The Extreme and Arista switches ran the "hottest," at around 45 and 50 ppm over line rate respectively.
Copyright © 2010 IDG Communications, Inc.