Cisco and other Ethernet switch vendors explain shift from three- to two-tier data center networking architectures
The emergence of 10 Gigabit Ethernet, virtualization and unified switching fabrics is ushering in a major shift in data center network design: three-tier switching architectures are being collapsed into two-tier ones.
Higher, non-blocking throughput from 10G Ethernet switches allows users to connect server racks and top-of-rack switches directly to the core network, obviating the need for an aggregation layer. Also, server virtualization is putting more application load on fewer servers due to the ability to decouple applications and operating systems from physical hardware.
More application load on less server hardware requires a higher-performance network.
Moreover, the migration to a unified fabric that converges storage protocols onto Ethernet also requires a very low latency, lossless architecture that lends itself to a two-tier approach. Storage traffic cannot tolerate the buffering and latency of extra switch hops through a three-tier architecture that includes a layer of aggregation switching, industry experts say.
All of this necessitates a new breed of high-performance, low-latency, non-blocking 10G Ethernet switches now hitting the market. And it won't be long before these 10G switches are upgraded to 40G and 100G Ethernet switches when those IEEE standards are ratified in mid-2010.
"Over the next few years, the old switching equipment needs to be replaced with faster and more flexible switches," says Robin Layland of Layland Consulting, an adviser to IT users and vendors. "This time, speed needs to be coupled with lower latency, abandoning spanning tree and support for the new storage protocols. Networking in the data center must evolve to a unified switching fabric."
A three-tier architecture of access, aggregation and core switches has been common in enterprise networks for the past decade or so. Desktops, printers, servers and LAN-attached devices are connected to access switches, which are then collected into aggregation switches to manage flows and building wiring.
Aggregation switches then connect to core routers/switches that provide routing, connectivity to wide-area network services, segmentation and congestion management. Legacy three-tier architectures naturally have a large Cisco component – specifically, the 10-year-old Catalyst 6500 switch – given the company's dominance in enterprise and data center switching.
Cisco says a three-tier approach is optimal for segmentation and scale. But the company also supports two-tier architectures should customers demand it.
"We are offering both," says Senior Product Manager Thomas Scheibe. "It boils down to what the customer tries to achieve in the network. Each tier adds another two hops, which adds latency; on the flipside it comes down to what domain size you want and how big of a switch fabric you have in your aggregation layer. If the customer wants to have 1,000 10G ports aggregated, you need a two-tier design big enough to do that. If you don't, you need another tier to do that."
Blade Network Technology agrees: "Two-tier vs. three-tier is in large part driven by scale," says Dan Tuchler, vice president of strategy and product management at Blade Network Technologies, a maker of blade server switches for the data center. "At a certain scale you need to start adding tiers to add aggregation."
But the latency inherent in a three-tier approach is inadequate for new data center and cloud computing environments that incorporate server virtualization and unified switching fabrics that converge LAN and storage traffic, experts say.
Applications such as storage connectivity, high performance computing, video, extreme Web 2.0 volumes and the like require unique network attributes, according to Nick Lippis, an adviser to network equipment buyers, suppliers and service providers. Network performance has to be non-blocking, highly reliable and faultless with low and predictable latency for broadcast, multicast and unicast traffic types.
"New applications are demanding predictable performance and latency," says Jayshree Ullal, CEO of Arista Networks, a privately held maker of low latency 10G Ethernet top-of-rack switches for the data center. "That's why the legacy three-tier model doesn't work because most of the switches are 10:1, 50:1 oversubscribed," meaning different applications are contending for limited bandwidth which can degrade response time.
This oversubscription plays a role in the latency of today's switches in a three-tier data center architecture, which is 50 to 100 microseconds for an application request across the network, Layland says. Cloud and virtualized data center computing with a unified switching fabric requires less than 10 microseconds of latency to function properly, he says.
Part of that requires eliminating the aggregation tier in a data center network, Layland says. But the switches themselves must use less packet buffering and oversubscription, he says.
Most current switches are store-and-forward devices that store data in large buffer queues and then forward it to the destination when it reaches the top of the queue.
"The result of all the queues is that it can take 80 microseconds or more to cross a three tier data center," he says.
New data centers require cut-through switching – which is not a new concept – to significantly reduce or even eliminate buffering within the switch, Layland says. Cut-through switches can reduce switch-to-switch latency from 15 to 50 microseconds to 2 to 4, he says.
Another factor negating the three-tier approach to data center switching is server virtualization. Adding virtualization to blade or rack-mount servers means that the servers themselves take on the role of access switching in the network.
Virtual switches inside servers takes place in a hypervisor and in other cases the network fabric is stretched to the rack level using fabric extenders. The result is that the access switching layer has been subsumed into the servers themselves, Lippis notes.
"In this model there is no third tier where traffic has to flow to accommodate server-to-server flows; traffic is either switched at access or in the core at less than 10 microseconds," he says.
Because of increased I/O associated with virtual switching in the server there is no room for a blocking switch in between the access and the core, says Asaf Somekh, vice president of marketing for Voltaire, a maker of Infiniband and Ethernet switches for the data center. "It's problematic to have so many layers."
Another requirement of new data center switches is to eliminate the Ethernet spanning tree algorithm, Layland says. Currently all Layer 2 switches determine the best path from one end-point to another using the spanning tree algorithm.
Only one path is active, the other paths through the fabric to the destination are only used if the best path fails. The lossless, low latency requirements of unified fabrics in virtualized data centers requires switches using multiple paths to get traffic to its destination, Layland says. These switches continually monitor potential congestion points and pick the fastest and best path at the time the packet is being sent.
"Spanning tree has worked well since the beginning of Layer 2 networking but the 'only one path' [approach] is not good enough in a non-queuing and non-discarding world," Layland says.
Finally, cost is a key factor in driving two-tier architectures. Ten gigabit Ethernet ports are inexpensive – about $500, or twice that of Gigabit Ethernet ports yet with 10 times the bandwidth. Virtualization allows fewer servers to process more applications, thereby eliminating the need to acquire more servers.
And a unified fabric means a server does not need separate adapters and interfaces for LAN and storage traffic. Combining both on the same network can reduce the number and cost of interface adapters by half, Layland notes.
And by eliminating the need for an aggregation layer of switching, there are less switches to operate, support, maintain and manage.
"If you have switches with adequate capacity and you've got the right ratio of input ports to trunks, you don't need the aggregation layer," says Joe Skorupa, a Gartner analyst. "What you're doing is adding a lot of complexity and a lot of cost, extra heat and harder troubleshooting for marginal value at best."