Spanning tree is still with us
So long as Layer 2 switches and VLANs are around, you'll need to know more than you probably care to about the Spanning Tree Protocol.
Is your network in a state of flux, with a mix of Layer 2 switches supporting some segments, and Layer 3 and perhaps Layer 4 devices handling others? Maybe you have virtual LANs implemented in some of your net. If this sounds familiar, add Spanning Tree Protocol to your list of things to think about.
It's not unusual for an organization to have Layer 2 switches with 10M or 100M bit/sec links to local users and 100M or Gigabit Ethernet uplinks to a Layer 3 or 4 switch. In this scenario, the Layer 2 segments are essentially bridged networks.
If the bridged nets have redundant links to other LANs, you need to employ the Spanning Tree Protocol to ensure these links function properly, with one in backup mode to the other. VLANs compound the problem because each VLAN essentially represents a bridged network. Therefore, you need to configure a separate spanning tree implementation for each VLAN.
As you graduate to Layer 3 switches, the spanning tree problem disappears because you're dealing with routed, not bridged, networks. Your goal should be to rid yourself of spanning tree by going to fully routed nets based on Layer 3 or 4 switches.
We brought various switches into our lab to see if there were any similarities in the way vendors implement the Spanning Tree Algorithm that could translate into configuration advice. We looked at nine switches from five vendors and found almost as many variations as the number of vendors. Bottom line: You'll have some work to do to ensure that spanning tree is enabled properly on your various Layer 2 switches.
Before we get into the details, it's worth going over the background. The Spanning Tree Protocol, officially part of the IEEE 802.1D standard for media access control bridges, is a link management protocol. Any device that performs Layer 2 switching uses spanning tree.
In a network with redundant connections between bridged LANs, one connection is always in the forwarding position, passing all traffic. The other is in a standby, or blocking, position. If the first connection goes down, spanning tree is the algorithm that learns about the disruption and ensures the backup connection kicks in.
Without spanning tree in place, it's possible that both connections may be simultaneously live, which could result in an endless loop of traffic on the LAN. That situation occurs because in a bridged LAN there has to be only one path from Point A to Point B. If there's more than one path, it's possible - even likely - that the same packets will be shuttled back and forth in different directions because of the way internal bridge or switch tables are populated.
Spanning tree is enabled by default on most vendors' switches, but you'll likely have to change some of the settings. Changing settings can be a difficult task because spanning tree is quite complex; the 802.1D standard is 378 pages long. However, adding switches to a network without reconfiguring them can lead to slow user logons, failed connections and the unauthorized movement of users between VLANs.
Based on our tests of switches from Cisco, Nbase-Xyplex, Foundry, Olicom and Anritsu, it's clear that most vendors implement spanning tree differently. For example, some vendors let you enable or disable spanning tree on an individual port basis, so Port 5 may have spanning tree enabled, while Port 6 doesn't. However, all the vendors at least seemed to have the same default spanning tree filters.
One overall problem with spanning tree is that it's a slow protocol and often can't keep up with the speed of today's networks. For example, to ensure that data goes where it's supposed to, spanning tree employs bridge protocol data unit (BPDU) packets, which contain information on ports, addresses, priorities and costs. But some Novell and Microsoft clients connect to a switch port so quickly that spanning tree doesn't have time to send the BPDU packets. That situation creates the possibility of packets being shipped to ports where they shouldn't be going, circumventing the protection that spanning tree is intended to provide.
Similarly, a VLAN user who is moved from one switch to another may experience delays while the new switch port learns the user's new location. In large bridged networks, it's possible for enough delay to occur that data is lost and must be retransmitted. And broadcast traffic on a bridged network always has the potential of slowing down the network when protocols such as spanning tree react too slowly.
When we looked at spanning tree in the lab, our Novell clients failed to connect with some vendors' switches. The only way we finally got connected was by disabling spanning tree on the switch ports that connected to the Novell clients in question. We also at times saw NT Server arbitrarily reboot.
On their Web sites, Microsoft and Novell have fixes for certain switches that experience these problems. The fixes usually involve disabling the port, as we did, or setting a registry parameter that controls the timing for the client to look to the next server. Disabling spanning tree on an individual port basis shouldn't cause any problems, as long as you disable only client ports, as opposed to the main uplinks. And disabling client ports is certainly easier than mucking with timing controls in the registry.
Our Windows NT workstations running Microsoft's TCP/IP client didn't have any connection problems throughout 150 logon and connection attempts in DHCP mode with spanning tree enabled. Still, the NT support Web page mentions some potential DHCP spanning tree problems. The support Web page suggests disabling the client spanning tree port to avert problems.
Zeroing in on the problems
We worked with engineers from Netcom Systems, maker of the SmartBits performance analyzer, to observe a client logging on and off NT and NetWare servers - a process spanning tree is supposed to monitor.
We configured a SmartBits analyzer to send 1,000 packet/sec with a broadcast address from one SmartBits port module to a port on Switch 1. We configured the SmartCounters on the Netcom device to monitor the time between when the sending port started shipping packets, and when they began arriving at the receiving port. We did this for three different ports per switch and averaged the times.
The average time with spanning tree disabled was about 4 seconds slower than when the protocol was enabled. This result is about as expected in a small network; the delay could be greater in a larger one.
In fact, a 4-second delay in a large network could cause trouble for users trying to connect to server resources. For example, if a client is trying to log on to a DHCP server and is subject to the 4-second delay, the client may not be able to get an IP address and, therefore, won't be able to log on to any networked servers.
Next, we tried some real-world tests and garnered essentially the same results.
We powered up workstations on Switch 1 and began logging on to servers connected on Switch 2. We were unable to log on through some switches. Novell recommends changing the timing logon within the client. As noted above, in a large network with thousands of nodes, that's not acceptable. Disabling spanning tree on a port-by-port basis is much simpler.
Long-term, an even better solution from an administrator's point of view is to migrate to Layer 3/Layer 4 switches and avoid VLANs. There's no reason to revert to bridging now that it's entirely feasible to build a routed, wire-speed network using Layer 3/ Layer 4 switches.
Short of that, you can simplify your life if you stick to one vendor's product line when using spanning tree so you won't have to learn the vagaries of multiple vendors' implementations. Or if vendors could find a way to put redundant links on instant standby, it would eliminate the need for spanning tree loop protection - then spanning tree could join Arcnet in the network history books.
Tell us your thoughts on this article or the issues it raises.
Get to layer 3 and kiss spanning tree good-bye
Network World, 5/24/99.
Upgrading The Network with Routing Switches
NetReference paper that starts with an explanation of Layer 2-based networks.
Lewis is technical director for the SIGNAL Technology Solution Center. Also contributing to the review were TSC Tiger Team members James Bak, Steve Wilson and Erik Leigh.
SIGNAL, founded in 1987, is an IT services provider in Fairfax, Va.
Lewis can be reached at Tech_solutions@ signalcorp.com.