Cisco's New "Validated Architecture for Long Distance VMotion" is Cheap Marketing

Hey, Cisco, Don't Do Me No Favors!

I was disapointed last week to see how cheap and over-hyped Cisco's "Validated Architecture for Long Distance VMotion" was. When I saw the report on NW I thought this would be a good reference on the real issue inter-data center VMotion: dealing with crossing Layer-3 boundaries. But, to my chagrin, it turned out to be a total marketing effort. It took Cisco a 17-page whitepaper, blog entry, video, and presentation at VMworld to essentially tell us to do this for inter-data center VMotion:

  1. Buy a really big WAN link, preferably WAN Ethernet.
  2. Trunk all VLANS over that new WAN link.
  3. Run VMotion.

WOW! Thanks! I would've never thought of that. And Cisco tested it at up to whole 200 km. Great! I was really worried about running VMotion over a Gigabit link with 4-5 ms of delay. Shew, glad we got that covered. This architecture is flawed in many ways.

One of the biggest problems is it doesn't cover Layer-3 impacts of this VLAN WAN extension (there's one paragraph about this problem with "active-active HSRP" as the fix...huh?). Let's say you do trunk all the VLANs to your other data center, but the subnet itself still has to be advertised via routing protocols to the rest of the network. This advertisement will bring traffic toward the source of the route advertisement. But what happens if the original data center is still advertising the subnet, but the destination VM has VMotion'd to the other data center? Well, then the user traffic still has to go to the old data center and ride across your new, fat WAN link to the VM. What benefit does this add? What if DC1 is going down? Are you going to make OSPF/BGP route advertisements manually in DC2 to keep the subnet advertised? You could advertise the subnet out of both data centers to begin with since you have a L2-trunk between them, but how do you ensure traffic will enter the right data center? User traffic for VM1, which is still in DC1, could now enter DC2 and then have to flow across the fat WAN link. This is suboptimal routing and will affect user performance. The only way to fix this is to leak /32 routes into your global routing table, but that gets messy...FAST. And the worst part of this marketing campaign is some senior IT manager could get a hold of it, not realize how silly this design is, and start asking when it can be implemented. This design doesn't solve real, long-term problems with inter-data center VMotion, but senior managers may want to invest in it now, wasting money on a solution that solves a short-term tactical problem without long-term strategic benefits. Cisco, because of their market size and clout, has a responsibility not to put out "reference architectures" that are nothing more than the obvious designs network engineers would probably shy away from.

What I really want to see from Cisco is a solution to the biggest problem on inter-data center VMotion: dealing with crossing Layer-3 boundaries. When a VM is VMotion'd to another data center, it's almost a certainty this other data center is going to have different IP subnets. Thus, to work properly, the IP address of the VM needs to change. This IP address change must be coordinated with all other parts of the infrastructure environment such as DNS, load balancing, authentication, and management platforms. That's tough, but it is the real problem limiting VMotion. Cisco's whitepaper does mention this problem, but punts the problem down the road:

Deploying VMware VMotion across data centers that are dispersed over very long distances (500 miles or more) potentially involves moving the virtual machine to an entirely new subnet, but the goal continues to be to help ensure that the IP address of the virtual machine as well as the existing client connections are not disrupted. This type of VMware VMotion migration is not possible with existing technologies. Special hardware and software features will be required to route the TCP connections to the virtual machine in its new location without terminating the sessions. This approach will require the redesign of the IP network between the data centers involving the Internet. Technologies are being developed by Cisco, VMware, and standards organizations to address this network scenario in the future.

Cisco should've waited to deploy a "reference architecture" until this problem is solved. A combination of tunneling, ACE loadbalancers, DNS updates, and NAT'ing will probably be needed. Or maybe something cool with a little internal MPLS/VPLS. F5 is tackling this issue with similar ideas. It doesn't appear perfect, but it's far beyond this Cisco "reference architecture":

This is a poor solution from Cisco.

More >From the Field blog entries:

Arista's New vEOS Providing Competition for the Cisco Nexus 1000V

It's One of Those Opinionated Days Again

A Private Extranet for Cloud Computing

It's Really Only Partly Cloudy Out There

Networking in the (Thunder) Clouds

Networking in the (Storm) Clouds

  Go to Cisco Subnet for more Cisco news, blogs, discussion forums, security alerts, book giveaways, and more.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2009 IDG Communications, Inc.

SD-WAN buyers guide: Key questions to ask vendors (and yourself)