How to Stretch VLANs Between Multiple Physical Data Centers – Part 1

There and Back Again With 802.1q, EoMPLS, VPLS and OTV

Current Job Listings

Now that we have sorted out how to scale our intra data center L2 networks let's explore the acronym soup of Data Center Interconnect (DCI). This is quickly becoming one of the most frequent conversations I have with customers as they are under increased pressure to stretch VLANs between physical facilities. At face value this doesn't sound that hard does it? We can simply implement a native 802.1q trunk between the sites and voila, we're rocking and rolling. Easier said than done and the network community is littered with the war stories from those who have "been there, done that" and have the scars to prove it.

                I've blogged about the challenges customers face with STP as a protocol within a data center. The same challenges are exacerbated when we consider stretching VLANs between data centers. What if I want to use all of the bandwidth on the high speed interconnects? I can play games with STP to provide some traffic engineering in my network but this is fraught with opportunity for mis-configuration and adds complexity. It also doesn't address considerations like what happens if I need to cross the data center interconnect to reach my default gateway. What happens to my application when this sub-optimal switching happens for each packet in a flow?  This becomes exponential when I consider this happening dozens or hundreds of times for a complete transaction. Even low latency adds up quickly when a transaction crosses a 1ms link multiple times.

                A final consideration would be the scenario where STP has a "bad day", say a loop or broadcast storm in one data center. Never a good thing, but with a pure 802.1q trunk link, that "bad day" ripples over to the other data center and wreaks havoc there, too. In this topology once you provide the necessary connectivity, STP does what STP does and now you get to explain to management why their second, redundant, maybe even disaster recovery data center went down when the primary failed. Not an enviable position to be in for sure. Now that I've given my "Scared Straight" pitch, let me say that there are plenty of customers where this works fine and as always your mileage may vary depending on your change control and discipline in configuration consistency. For the rest of us...let's read on.

                The next three predominant technologies considered up to the DCI task include Ethernet over MPLS (EoMPLS), Virtual Private LAN Service (VPLS) and Advanced VPLS (A-VPLS). All three technologies utilize the concept of a pseudo-wire (PW) service that behaves much as a physical cable between ports or VLANs. Many customers consider PW technology like a worm hole - traffic goes in, magic happens and the traffic shows up on the other side of the worm hole. Sounds very cool, huh? It is, but in the context of DCI there are some key considerations to keep in mind. The first being the fact that it is a PW so broadcast, multicast, protocols like STP, CDP and others that are seen "on the wire" are forwarded to the other side. PW technologies don't address the sub-optimal routing challenges either.  For completeness there are a few other gyrations of these PW technologies that include wrapping them in GRE, maybe adding IPSec encryption, etc.  further complicating the operational aspects  in my opinion and I'll not discuss them but know that they exist.

 When we consider these three options, one of the first concerns customers have is the MPLS requirement. Many customers are used to using MPLS as a carrier provided service, not running and supporting MPLS themselves. Their experience with MPLS is with a  BGP or IGP handoff to a carrier and let the carrier deal with and support the MPLS. If your DCI can be facilitated by carrier provided and managed MPLS the underlying complexity will not be exposed to you. For those that will be running MPLS internally, be prepared to support the additional protocols associated with MPLS including but not limited to LDP and in some cases multi-protocol BGP and protocols to facilitate traffic engineering. Not a trivial undertaking for some companies.

The next concern is sub-optimal egress routing where traffic traverses the DCI multiple times and as such, poor performance, application timeouts and increased bandwidth may be observed. Sounds like an easy fix for a load balancer and some creative NAT configurations but again, adds complexity to the environment. Egress routing may be optimized with a technique called First Hop Redundancy Protocol (FHRP) localization. This allows local routing in each data center as the default gateway is localized and the DCI is reserved for genuine or legitimate L2 DCI traffic.

So what are the differences between the acronym soup - stay tuned, that'll be the topic of my next post.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:
Now read: Getting grounded in IoT