Deep dive: Flat networks are the future

Turn your data center into a cloud with routing at Layer 2

A large flat layer 2 network is the key to a new unified fabric data center. The idea is that everything in the data center - servers, appliances and storage - should be part of one big flat layer 2 structure.

Server virtualization is the big reason for going flat. The big benefit of a flat network is its flexibility. A layer 2 network is plug and play, requiring no reconfiguration or change in IP addresses when resources move. This makes implementing server virtualization easier with less chance for configuration errors, and moves the data center closer to becoming a cloud.

The only way this can work is that the new flat can't be like the old flat. What needs to change is the old Spanning Tree algorithm because it created scaling problems. Spanning Tree is inefficient because it doesn't use all the available paths between switches and the routes are not always the shortest or fastest.

Spanning Tree's inefficiencies create another problem. The traffic between servers, appliances and storage is becoming the dominant traffic type within the data center and it requires very low latency. Storage traffic requires that the network have a lossless characteristic meaning switches can no longer discard packets but instead must avoid congestion. Spanning tree can't meet the challenge of lossless and extremely low latency.

IN DEPTH: Paving the way for the flat network

The question has always been how to overcome Spanning Tree's limitations while preserving the flexibility of a flat network? Revisions to the Spanning Tree protocol have been implemented over the years making it better, but never completely overcame the problems.

The change that makes the new flat networks prescription work is routing at layer 2. This gets rid of Spanning Tree and its limitations while preserving the plug and play nature of layer 2. Routing selects the shortest and fastest route. It also uses every link in the data center, instantly increasing the network's capacity and lessening congestion. It's called flat because it still uses the location independent media access control (MAC) address, which means the address space is flat since it has no hierarchical structure, like IP's location dependent addresses.

Why wasn't routing done in the past? The reason is simple, it was too expensive back in the 1980s when bridging and routing were invented. Routing needed the help of an address that contains location information to provide a performance boost, allowing routers to be built using less processing power and memory. Now memory and processors are more powerful and cheaper. It is now possible to build large data center networks that can be routed based on the location independent MAC address.

Here comes TRILL

The networking industry rarely comes up with only one solution. There are two standard based answers and many vendors are implementing their own solutions.

One of the answers is called TRILL. TRILL stands for Transparent Interconnection of Lots of Links. TRILL is being worked on at the IETF. The base protocol was approved last March. The final standard with all the parts needed to make it work is expected to be approved this year.

The simplest way to think of TRILL is that it encapsulates packets along a hop count. It then routes the encapsulated message through the data center's TRILL switches using the Intermediate System-to-Intermediate System (IS-to-IS) routing protocol and de-encapsulating it before delivering it to the destination.

Layer 2 switches that implement TRILL are call RBridges. RBridges is short for Routing Bridges. Bridges was the name for Layer 2 switching before vendor marketing departments got involved. The RBridges in the data center exchange control messages about the current network characteristics, allowing them to calculate the best routes.

RBridges also automatically figure out the best structure for sending multicast and broadcast messages between themselves. An important point is that the routes do not have to be symmetric, meaning the route between A and B does not have to follow the same path on the return from B to A. Like IP, each direction is the best path, but in most cases the paths will follow the same route.

RBridges learn the MAC and VLAN addresses of the devices connected to them. When a device sends a message to a destination not directly connected to the RBridge, the RBridge broadcasts the packet to all the other RBridges to find out who has the destination device. The RBridge that has the device responds, letting the originating RBridge know where it needs to send the messages.

Other ways RBridges learn MAC addresses is using its End Station Address Distribution Information protocol; by having them manually configured or other layer 2 registration protocols. Most enterprises will find that the automatic learning method is adequate. All the entries in the table are timed out so that if a device moves, the RBridge only has the outdated information for a short period of time. Intermediate switches, such as core and aggregation switches, don't learn end station MAC addresses, reducing their burden.

RBridges send data packets between themselves by encapsulating using a small 8-byte TRILL header. The header has the next RBridge in the packet's path. Each RBridge along the way replaces it with next RBridge's ID in the path. This replacing the IDs at each step does open the possibility that loops can form, a problem IP had early in its life. The solution is the same as implemented in IP, a Time-to-Life (TTL) counter which each RBridge decrements.. When it reaches zero the message is discarded. Multicast data presents more of a problem and an additional mechanism was added called a Reverse Path Forwarding Check to identify possible loops. Reverse Path Forwards Check is a way for the switch to check to see if the packet arrived on the port it was expected from. The RBridge discards packets received from the non-expected port.

TRILL fully supports VLANs. RBridges identify a device by both their MAC and VLAN number. This allows them to treat each MAC and VLAN pair differently. A simple example with a server with two applications that use different VLAN numbers demonstrates TRILL's flexibility. The RBridge can send the flow from each VLAN down a different path through the network. Additionally, if the server is connected to two different RBridges, then one MAC and VLAN pair can be associated with one RBridge while the other MAC and VLAN pair can be associated with the other RBridge.

Shortest Path Bridging

An alternative to TRILL is Shortest Path Bridging (SPB). SPB is defined in the IEEE 802.1aq standard and is based on previous work done for Provide Backbone Bridges (PBB) aimed at server and Metro Ethernet vendors. This provides SPB with an advantage since most current management and monitoring techniques do not need to be modified to track SPB. Work on the standard is expected to be completed this year.

TRILL and SPB have much conceptually in common. SPB, like TRILL, uses IS-to-IS to understand the data center topology and calculate the best path through the network. It encapsulates packets with most implementations using 802.1ah (Mac-in-Mac). Mac-in-Mac takes the existing packet and puts a new header on it with the MAC address of the first SPB switch and the destination SPB switch. This version of SPB is also referred to as SPBM. All the header information from the original packet is preserved just like in TRILL in the encapsulated message.

SPB learns about where destination MAC addresses are by the ARP broadcasting request, which it forward to the other SPB nodes and then watching for a response, much like TRILL does. All the SPB switches that see the broadcast learn where origin is, making network wide learn very fast. Only the edge switches know the location of endpoints, an improvement over the older Spanning Tree where every node in the tree had to learn every device.

The key difference between SPB and TRILL is that SPB uses the trees structures but in a creative way to provide the same flexibility and robustness as TRILL. The older Spanning Tree has a single spanning tree for the entire layer 2 network. A new version, 802.1Q, did improve on this by having a spanning tree for each VLAN. The key point is that with Spanning Tree all the switching uses the same universal tree. SPB improves on spanning trees by having each switch create its own tree for each VLAN representing the best way for it to reach the other switches. Instead of one tree shared by all the switches, each SPB switch has its own optimized tree based on IS-to-IS. This allows all the links in the data center to be used. SPB allows up to 16 paths or trees between two switches with more possible with extensions to the protocol.

Another difference between TRILL and SPB is that SPB routes are symmetric. The route from A to B and the return from B to A uses the same path. Symmetric routes allows SPB to take advantage of much of the management and monitoring already in existence such as loopback and traceroute. SPB also uses the same tree for both unicast and multicast traffic whereas TRILL does not necessary use the same path.

SPB also differs from TRILL in how it prevents loops from forming. SPB does not use a TTL approach. Instead it determines which ports it expects packets to arrive over by performing a Reverse Path Forwarding Check.

One Big Switch

An alternative is to have the data center switches act as one big switch - a virtual switch. This allows vendors to implement routing within the cluster of their own switches much as a switch "routes" between the ports on the switch. The best route can be used and all the links between switches are active. This solves the problem that both TRILL and SPB are addressing and provide a good stepping stone to a standards-based approach. The virtual switch appears to the other switches in the data center as one switch and does interoperate with other vendors equipment. Brocade, Extreme, HP and Juniper either have or plan to release this year a version of the virtual switch solution.

The biggest issue with this approach is that it is proprietary. The solution only works with one vendor's switch, no mixing and matching of vendors allowed. Some vendors have limited the downside by basing their solution on standards. Brocade's is based on TRILL and Extreme bases its solution on Link Aggregation. The next issue is that the number of switches that can be linked together to form the virtual switch is limited. Generally the number of switches that can be linked together ranges from nine to 11.

Many vendors see the virtual switch as a stepping stone to a TRILL or SPB based solution. They can deliver it now while they work on the standard based approach. Additionally, once they have implemented TRILL or SPB they can still keep the virtual switch. The virtual switch appears as one big RBridge or SPB switch. The benefit of this approach is that within the virtual switch they automatically configure themselves.


There is one interesting side benefit of TRILL and SPB that helps implement server virtualization or potentially a negative depending on how VLANs are used. Both solutions use a VLAN number that is unrelated to the VLAN number the device uses. For example, RBridges use their own VLAN number for communication between themselves, a number that has nothing to do with the VLANs the servers use. The application's VLAN is hidden by the TRILL header and is not used in forwarding between RBridges. This means the application's real VLAN number is used only on the edge, between the RBridges and the device.

The positive side is that it makes moving virtualized devices easier. Today when a virtual server moves to a new switch that switch needs to be updated to support the VLAN. Additionally, all the immediate switches in the paths that the devices use need to be updated to support the VLAN. Switch and virtualization vendors have schemes to update the VLAN in the switch the device connects to and removing it when the virtual server is taken down, but no one has worked out a good way to handle this in the intermediate switches. RBridges eliminate the immediate switch problem since RBridges don't use the application's VLAN and thus don't have to be configured for the device's VLAN number.

The potential problem with this is that if VLANs are used to keep traffic away from certain switches or routes, then RBridges will not do this. This should not be a problem for most enterprises since VLANs were not used to police intermediate switches but instead used to keep traffic from reaching unauthorized servers, which RBridges will continue to do. It is possible to overcome this problem with SPB but it does require careful configuration.

Vendors choose sides

Blade Networking,; Brocade, Cisco, Extreme Networks, Force10 Networks, Huawei and HP are planning to support TRILL Cisco refers to its FabricPath solution as a pre-standard version of TRILL. Cisco's version has some additional features not included in the standard to support migration and features they added before TRILL. It needs to be noted that when TRILL comes out, customers will have to migrate to TRILL. If they need the proprietary features in FabricPath they will not be able to support both TRILL and FabricPath running in the switch -- it is an either or situation. Brocade has an early version of TRILL as a building block in its soon to be released Virtual Cluster Switching (VCS).

SPB also has its fans. Avaya is implementing SPB in its switches and expects to ship it in the first quarter of 2011. Alcatel-Lucent, Enterasys, HP and Huawei are planning on supporting SPB.

Multiple solutions will be a reality for 2011 and possibility beyond. The news is not all bad. Vendors are conducting test to ensure that their implementations will interoperate.

Layland is head of Layland Consulting. He can be reached at


Copyright © 2011 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022