Americas

  • United States

Review: Is white-box switching the future of networking?

Reviews
Oct 12, 201517 mins
LinuxNetwork SwitchesNetworking

crystal ball future tech servers
Credit: Thinkstock

What if you could manage your data center switches and routers the same way you manage your servers, and cut capital expense costs in the bargain?

That’s the pitch for white-box networking, the move toward open-source network operating systems running on commodity hardware.

To find out what it’s like to live in a white-box world, we tested Cumulus Linux, from Cumulus Networks, running on the AS5712-54X data center switch from Edge-Core Networks, a wholly owned subsidiary of Accton Technology.

The AS5712 has the same guts as a lot of other 10G/40G Ethernet top-of-rack switches, and Cumulus Linux supports most of the right data center buzzwords: VxLAN tunneling, equal-cost multipath (ECMP) routing, and multichannel link aggregation groups (MLAG). Later this year, Cumulus says it will support multiprotocol label switching (MPLS) and Virtual Routing and Forwarding Lite (VRF Lite).

Although we ran an extensive battery of performance tests, our main interest was to learn how white-box networking differs from the experience with data-center incumbents such as Arista, Brocade, Cisco, Juniper and HP.

While performance is a wash, we found big differences in price, configuration, management, and usability. Linux-based networking systems offer more power and control than proprietary alternatives, but the Linux learning curve can be steep. We don’t think white-box networking is ready for campus networks. But in the data center, especially given the cost savings, we think a growing number of network engineers will find white-box networking worth a look.

Here are 10 lessons we learned while evaluating white-box networking.

1. WHITE BOX WINS ON PRICE

Physically, the AS5712 looks like lots of other data center top-of-rack switches. Built around Broadcom’s Trident2 switching silicon, it supports up to 72 10G Ethernet interfaces, or 48 10G Ethernet and six 40G Ethernet uplinks. We tested it both ways, each using low-cost direct-attach copper cables (DACS) attached to our Spirent TestCenter traffic generator/analyzer.

Excluding DACS or other transceivers, Edge-Core sells the AS5712 through resellers at a suggested price of $8,898, including a one-year subscription to Cumulus Linux. Each additional year of licensing and 24/7 support has a suggested price of $999.

Those numbers compare favorably with, for example, a Cisco Nexus 3172, which starts at $14,000, plus an additional $5,000 for layer-3 routing support. The usual disclaimer applies here: Pricing is squishy, as few customers pay list price. Final cost depends heavily on quantity, features, supplier politics, and other factors. But as a starting point, commodity hardware and Linux software list prices begin at a substantially lower point.

2. IT’S JUST LINUX

While network boxes running on Unix-like operating systems are not new, Cumulus Linux is more tightly coupled with Linux than competing offerings.

Cumulus Linux doesn’t just use Linux as a boot loader; it is Linux. Based on the Debian distribution, it provides all networking functions through standard Linux tools. Its command-line interface (CLI) is a Bash prompt. It uses iproute2 tools for interface configuration, runs the quagga daemon for routing, and offers automation through unmodified Linux APIs.

+ ALSO ON NETWORK WORLD: David Newman reviews FireEye malware detection software  +

In contrast, many other vendors of Linux-based networking devices extensively rewrite routing and switching functions, and make these available through proprietary CLIs or APIs.

Cumulus says it’s tried to make Cumulus Linux “as open as possible.” Almost all components have source code available under the GNU General Public License (GPL). The only exception is switchd, the Cumulus-developed daemon that deals with the Broadcom Trident2 chip, and that’s because Broadcom’s software development kit (SDK) is closed-source. (Parts of Arista’s EOS also are available under open-source licenses.)

3. IT’S PORTABLE

Cumulus Linux also differs in terms of portability. For all the features of competing network operating systems (including Linux features), they run on, and only on, one vendor’s hardware.

Cumulus Linux runs on multiple vendors’ white-box switches. As of this writing, Cumulus’ 10G Ethernet hardware compatibility list includes switches from Edge-Core (tested here), Dell, HP, Penguin Computing, Quanta, and Supermicro. All these vendors offer switch hardware built around Broadcom’s Trident2 chip. This is similar to the white-box server market, where open-source Linux or BSD operating systems run on x86-based hardware from many vendors.

Portability works both ways. The Edge-Core switch runs network operating systems that support the Open Network Install Environment (ONIE). Besides Cumulus Linux, this currently includes software from Big Switch Networks, IP Infusion, the ONOS project, Pica8, and Pluribus Networks.

Cumulus Linux can be configured and managed through automation and orchestration tools such as Ansible, Chef, Puppet, and SaltStack. Cumulus has sample Ansible and Puppet scripts on its website. Centralized configuration and control is not mandatory; management of a single switch from the Bash prompt works equally well.

Perhaps the most useful way of thinking about a white-box network device is that it can be managed in exactly the same ways as a server. Network managers can make changes on one box at a time at the Bash prompt; or on a few systems using quick-and-dirty shell scripts; or on thousands of switches using orchestration software.

4. SWITCHING SYNTAX IS DIFFERENT

Setting up basic layer-2 switching isn’t hard in Cumulus Linux, but the syntax is different from conventional data center switches.

Cumulus Linux stores layer-2 information about interfaces and VLANs in the /etc/network/interfaces file. Users edit this file to make configuration changes.

In Linux terminology, all members of the same VLAN are in an “Ethernet bridge.” Here is an excerpt from the interfaces file that creates a bridge called br0 and assigns the first 48 switch ports to that bridge:

auto br0 iface br0 bridge-ports glob swp1-48 bridge-stp off

The glob keyword is similar to interface range definitions in other vendors’ switches in describing a group of ports. The swpX designation refers to switch ports. All front-panel Ethernet interfaces are swpX interfaces except for eth0, which is reserved for out-of-band management.

In the Edge-Core switch, ports 49 to 54 are 40G Ethernet interfaces. We can add them to the same bridge, or optionally configure each 40G Ethernet port to supply four more 10G Ethernet interfaces via QSFP+ breakout cables. We tested the switch both ways, and had to edit the /etc/cumulus/ports.conf file to use the switch with breakout cables. The same file allows users to group four 10G Ethernet ports to act as a 40G Ethernet port, again via a breakout cable.

Another difference from most other switches is that layer-2 configuration changes aren’t instantaneous. Instead, users first edit the interfaces file, and then issue the ifreload –a command. This is conceptually similar to the commit command in Juniper’s Junos OS.

Note that this example disables spanning tree protocol (STP). That’s a common default in modern data center network designs, where other mechanisms handle loop prevention and redundancy. These redundancy mechanisms may include Transparent Interconnection of Lots of Links (TRILL) or proprietary variants such as Cisco FabricPath, as well as ECMP and MLAG.

Perhaps because of these other mechanisms, Cumulus Linux doesn’t fully support spanning tree. While it does do standard and rapid versions of STP, it does not support the multiple spanning tree protocol (MSTP), which provides separate spanning tree instances on a per-VLAN basis.

Confusingly, Linux provides spanning tree through the mstpd daemon, even though there’s no MSTP support. This isn’t a major shortcoming in data center networks, given the move away from STP, but it’s potentially an issue if Cumulus Linux were to move into campus networks.

VLAN configuration is the opposite of the Cisco model, binding interfaces to VLANs instead of the other way around. Here’s an example for a Cisco Nexus switch that creates VLANs 301 and 302 and then configures an interface as a trunk port for those VLANs:

conf t vlan 301-302 interface e1/1 switchport switchport mode trunk switchport trunk allowed vlan 301-302 no shutdown end

The Linux model does the opposite, mapping interfaces to the bridges (VLANs). This excerpt from Cumulus Linux is functionally identical to the Cisco trunk example:

auto br0 iface br0 bridge-ports swp1 bridge-vlan-aware yes bridge-vids 301 302

Some HP ProVision/ProCurve curves also work in a way similar to Cumulus Linux, but switches from Cisco, Arista, Juniper, Brocade, and others generally follow the opposite model.

One area of common ground is that Cumulus Linux, like most other switches, configures VLAN access ports on a per-interface basis:

auto swp11 iface swp11 bridge-access 301

5. ROUTING SYNTAX IS MOSTLY THE SAME

Cumulus Linux runs Quagga, the open-source routing stack. Quagga implements all major IP routing protocols, including IPv4 and IPv6 versions of BGP, IS-IS, OSPF, and RIP, and its CLI closely resembles that in Cisco IOS and other IOS-like operating systems.

Users run Quagga by enabling zebra and selected routing protocol daemons and then starting the quagga service. Users then can access the Quagga interface with the sudo vtysh command. (Note the use of sudo to become superuser; Cumulus Linux really is “just Linux” since superuser status is required to start and stop processes.)

As noted, Quagga’s command syntax is similar to Cisco’s. For example, these commands will configure OSPFv2 on Cumulus Linux:

sudo vtysh
 configure terminal
 router ospf
 router-id 198.18.0.1 log-adjacency-changes detail interface swp1 ip ospf area 0.0.0.0

And here are the equivalent commands for a Cisco Nexus device running NX-OS:

configure terminal
 router ospf 1 router-id 198.18.0.1 log-adjacency-changes detail interface ethernet 1/1 no switchport
 ip address 192.198.0.1/30 ip router ospf 1 area 0.0.0.0

Similarly, users can see routing status using Cisco-like commands such as show ip route summary and show ip ospf neighbor. Another common trait with Cisco and Cisco-like devices: Unlike Cumulus Linux’s layer-2 commands, configuration changes entered into Quagga take immediate effect.

Where Quagga differs is that users can’t configure IP addresses on interfaces, because it provides only the routing stack. Instead, users define interfaces by editing the /etc/network/interfaces file outside the Quagga shell and rerunning the ifreload –a command. Once back inside the Quagga shell, users can see interface status with the show interface command.

Another difference: Quagga doesn’t directly support command piping, for example to filter on verbose output. That capability is available, though. Users can run Quagga commands from the Bash prompt and redirect them to any tool Linux provides. For example, to see only OSPF routing parameters instead of the entire Quagga configuration, a user could run vtysh -c 'show run' | grep -4 'router ospf' from the Bash prompt.

Output piping is extremely powerful. It makes available Linux tools such as awk, sed, cut, sort, and scripting languages such as Perl, Python, and Ruby (all of which come installed with Cumulus Linux).

6. LINUX TAKES SOME LEARNING

As with any new product (which for many network professionals translates to “anything that’s not Cisco or something that behaves like Cisco”), there’s a learning curve involved. Cumulus helpfully provides a series of cheat sheets that map common commands in Cisco or Arista products to their Cumulus Linux counterparts.

In some cases, Cumulus Linux configuration syntax is easy. Especially for users experienced with Linux or other Unix-like operating systems, the learning curve won’t be that steep. There’s also the argument that if network engineers can learn one environment, they can learn others. For basic tasks, the knowledge involved in learning Cumulus Linux is not any more obscure than, say, learning how Cisco IOS configuration registers work.

Still, employers might balk given the time and cost of obtaining certifications in the various networking vendors’ environments. Employers that paid for those certifications, and that pay a premium for network engineers with certifications, may be reluctant to retool.

Programming knowledge can help, but it’s not a requirement. In the server admin and dev-ops worlds, engineers routinely write scripts of anywhere from a few to a few thousand lines to automate routine tasks. Unlike OpenFlow and some SDN products, where programming is a must, Cumulus Linux deliberately avoids that, offering instead a configuration and management experience that’s more akin to a conventional switch. If users want to interact with Cumulus Linux in a purely programmatic way instead of the CLI, they can – but it’s not a must.

7. LINUX STORES STUFF IN A LOT OF PLACES

Linux comes from the Unix tradition of using many small tools, each doing one job well. Although that’s changing (for example, with the controversial move by many Linux distributions, including Debian, to adopt the systemd master daemon), life with Linux today means living with configuration information spread across multiple files. Cumulus Linux takes some steps to combat this (see below), but users of current releases still will need to look in multiple places for configuration parameters.

This differs from virtually all conventional data center switches and routers, which store configuration data in a single startup file. Some approaches – such as HP Intelligent Resilient Framework (IRF) and Juniper QFabric – even span multiple physical switches or routers using one configuration file.

With Linux, configuration data lives in lots of places. Cumulus Linux uses /etc/network/interfaces to store layer-2 information such as data about interfaces and VLANs, but also puts interface setup information in /etc/cumulus/ports.conf. If routing is enabled, the system will consult the /etc/quagga/daemons file to see which routing protocols to load and the /etc/quagga/Quagga.conf file to see how routing is configured.

And that’s just the beginning. Setting protocol aging timers – something we do to avoid contention with the test traffic we generate – can involve changes to a half-dozen or more variables stored in Linux’s /proc filesystem. Granted, that’s a use case mainly relevant in test labs, but there are similar examples involving everyday networking tasks.

For example, it’s anything but simple to extend Address Resolution Protocol/Neighbor Discovery (ARP/ND) timers beyond Linux’s default 30-second aging time. This short timeout is fine for hosts, but in networking devices it can create a lot of unnecessary traffic. For that reason, many switches and routers use much higher default ARP timeout values (Cisco’s default is 4 hours).

To complicate matters, the Linux state machine uses a variable timeout value for consistency with IPv6 behavior, even with IPv4 traffic. The timeout is tunable, as with virtually all parameters in Linux, but it involves setting at least four parameters apiece for IPv4 and IPv6 under the /proc filesystem. Even then, we also had to stop a Cumulus Linux watchdog script called arp_refresh for the longer timer values to take hold. Cumulus has an open bug report about there being too many variables involved to set ARP/ND aging.

Cumulus Linux also differs in terms of which commands accomplish common networking tasks. For example, Linux access control lists filter routing table entries rather than packets, the opposite of Cisco’s ACLs (but similar to the route-filter command in Juniper’s Junos OS).

Cumulus Linux uses the included Netfilter firewall for packet-filtering functions. But instead of using iptables and ip6tables to configure Netfilter, Cumulus recommends its own cl-acltool to configure ACLs.

Linux also has extensive quality-of-service and route selection features, available through the traffic control (tc) and policy routing capabilities respectively. These are conceptually similar to the service/policy mapping model and policy-based routing in Cisco devices.

8. CUMULUS LINUX COMBATS CONFIG BLOAT, UP TO A POINT

Cumulus says it’s fully aware that network engineers are used to seeing configuration and management information in one place. To that end, it’s adding some commands that nicely aggregate information from multiple places. There’s still lots to do here, but the current commands begin to address the issue of configuration bloat.

A good example is netshow, which Cumulus uses to collect interface and troubleshooting data from multiple places. To get a quick overview of which interfaces are physically up, the network interface command does the trick:

cumulus@cumulus$ netshow interface

   Name   Speed     Mtu Mode     Summary

-- ------ ------- ----- -------- ------------------------

UP lo     N/A     16436 Loopback IP: 127.0.0.1/8, ::1/128

UP eth0   1G       1500 Mgmt     IP: 172.31.128.11/24

The netshow command also has options to get status about traffic statistics, including errors, as well as neighbor information using Logical-Link Discovery Protocol (LLDP). For automation, output from many netshow commands can be formatted in JavaScript Object Notation (JSON) format.

These kind of interface status displays are very similar to those in conventional data center devices, and can save network engineers the trouble of pawing through multiple configuration files, as described above.

Another useful tool for troubleshooting is Cumulus’ cl-resource-query command. It displays system limits, such as current and maximum levels of layer-2 and layer-3 forwarding tables, all in one screen. That’s very handy when investigating resource exhaustion issues. In contrast, discovering system limits in other switches and routers may require multiple commands, if they’re even available, or consulting product data sheets.

9. PERFORMANCE IS A NONISSUE

Performance was excellent across the board. Tests with all variations of unicast, multicast, IPv4, and IPv6, switching, and routing produced uniformly strong results.

Both with 64 10G Ethernet interfaces and a combination of four 40G Ethernet and 48 10G Ethernet interfaces, the switch moved traffic at virtual line rate in every single test case. We say “virtual” line rate because we saw trivial packet loss at nominal line rate – but no loss when we offered traffic at 99.999 percent of line rate. That’s a difference of 10 parts per million, and likely attributable to clock speed differences between the switch and the Spirent test tool. We don’t think the difference is significant.

white box switching 1b

Latency was low and predictable in all cases. Delay results were very much in line with other 10G Ethernet top-of-rack switches we’ve tested.

We also ran functional tests of VLAN trunking and link aggregation between the Edge-Core/Cumulus Linux combo and an Arista data center switch; in both cases, the white-box system behaved exactly as expected.

10. THERE MAY BE A WHITE BOX IN YOUR FUTURE

In the end, the decision to embrace white-box switching will depend on multiple factors, including economics, familiarity with Linux, sunken investment in training and certifications, and dependence on proprietary features. Any one of these might be a good reason to stick with proprietary switches, at least for now.

But we think that first one, economics, ultimately will matter most, for one simple reason: We’ve seen this movie before.

Fifteen to 20 years ago, Linux-on-commodity-hardware was an upstart going against HP, IBM, and Sun, the entrenched vendors of proprietary servers. That didn’t end well for the incumbents: Linux and white-box servers won, and the turnkey server market imploded, unable to compete with off-the-shelf hardware and free or low-cost software.

Then as now, Linux doesn’t offer all the features the incumbents do. It doesn’t have as large an army of well-trained and well-paid wizards looking after its care and feeding. It’s not yet polished or simple enough to deploy everywhere. But none of that will matter if, as has happened before, enterprises decide that white boxes running Linux are good enough.

Cumulus Linux on white-box hardware offers a glimpse into what the future of enterprise networking might look like: Lower cost, higher programmability, and greater flexibility and control.

THANKS

Network World gratefully acknowledges the support of Spirent Communications, which supplied its Spirent TestCenter traffic generator/analyzer equipped with HyperMetrics dX2 8-port 40G Ethernet and HyperMetrics dX 32-port 10G Ethernet test modules, along with engineering support, for this project. We conducted all performance and functional tests using Spirent TestCenter.

David Newman, a Network World Test Alliance partner, is president of Network Test in Westlake Village, Calif. He can be reached at dnewman@networktest.com.