I was disapointed last week to see how cheap and over-hyped Cisco's "Validated Architecture for Long Distance VMotion" was. When I saw the report on NW I thought this would be a good reference on the real issue inter-data center VMotion: dealing with crossing Layer-3 boundaries.
But, to my chagrin, it turned out to be a total marketing effort. It took Cisco a 17-page whitepaper, blog entry, video, and presentation at VMworld to essentially tell us to do this for inter-data center VMotion:
WOW! Thanks! I would've never thought of that. And Cisco tested it at up to whole 200 km. Great! I was really worried about running VMotion over a Gigabit link with 4-5 ms of delay. Shew, glad we got that covered. This architecture is flawed in many ways.

One of the biggest problems is it doesn't cover Layer-3 impacts of this VLAN WAN extension (there's one paragraph about this problem with "active-active HSRP" as the fix...huh?). Let's say you do trunk all the VLANs to your other data center, but the subnet itself still has to be advertised via routing protocols to the rest of the network. This advertisement will bring traffic toward the source of the route advertisement. But what happens if the original data center is still advertising the subnet, but the destination VM has VMotion'd to the other data center? Well, then the user traffic still has to go to the old data center and ride across your new, fat WAN link to the VM. What benefit does this add? What if DC1 is going down? Are you going to make OSPF/BGP route advertisements manually in DC2 to keep the subnet advertised?
You could advertise the subnet out of both data centers to begin with since you have a L2-trunk between them, but how do you ensure traffic will enter the right data center? User traffic for VM1, which is still in DC1, could now enter DC2 and then have to flow across the fat WAN link. This is suboptimal routing and will affect user performance. The only way to fix this is to leak /32 routes into your global routing table, but that gets messy...FAST.
And the worst part of this marketing campaign is some senior IT manager could get a hold of it, not realize how silly this design is, and start asking when it can be implemented. This design doesn't solve real, long-term problems with inter-data center VMotion, but senior managers may want to invest in it now, wasting money on a solution that solves a short-term tactical problem without long-term strategic benefits. Cisco, because of their market size and clout, has a responsibility not to put out "reference architectures" that are nothing more than the obvious designs network engineers would probably shy away from.

What I really want to see from Cisco is a solution to the biggest problem on inter-data center VMotion: dealing with crossing Layer-3 boundaries. When a VM is VMotion'd to another data center, it's almost a certainty this other data center is going to have different IP subnets. Thus, to work properly, the IP address of the VM needs to change. This IP address change must be coordinated with all other parts of the infrastructure environment such as DNS, load balancing, authentication, and management platforms. That's tough, but it is the real problem limiting VMotion. Cisco's whitepaper does mention this problem, but punts the problem down the road:
Deploying VMware VMotion across data centers that are dispersed over very long distances (500 miles or more) potentially involves moving the virtual machine to an entirely new subnet, but the goal continues to be to help ensure that the IP address of the virtual machine as well as the existing client connections are not disrupted. This type of VMware VMotion migration is not possible with existing technologies. Special hardware and software features will be required to route the TCP connections to the virtual machine in its new location without terminating the sessions. This approach will require the redesign of the IP network between the data centers involving the Internet. Technologies are being developed by Cisco, VMware, and standards organizations to address this network scenario in the future.
Cisco should've waited to deploy a "reference architecture" until this problem is solved. A combination of tunneling, ACE loadbalancers, DNS updates, and NAT'ing will probably be needed. Or maybe something cool with a little internal MPLS/VPLS. F5 is tackling this issue with similar ideas. It doesn't appear perfect, but it's far beyond this Cisco "reference architecture":
This is a poor solution from Cisco.
More >From the Field blog entries:
Arista's New vEOS Providing Competition for the Cisco Nexus 1000V
It's One of Those Opinionated Days Again
A Private Extranet for Cloud Computing
It's Really Only Partly Cloudy Out There
Networking in the (Thunder) Clouds
Networking in the (Storm) Clouds
Go to Cisco Subnet for more Cisco news, blogs, discussion forums, security alerts, book giveaways, and more.
Michael Morris is a communications engineering manager at a $3-billion high-tech company. His background is in enterprise WANs working with telcos and developing large-scale routing designs. He has worked on networks at government and corporate organizations, including networks at two Fortune 10 companies. In his current role, he leads a team of 10 engineers responsible for large-scale IT networking projects and architectural standards for data networks, storage area networks, IP telephony, contact centers, and security. Michael is CCIE #11733 and recently became one of the first three Cisco Certified Design Experts (CCDE) ever (#20080002). He has 11 years experience in networking and communications, including four years as a paratrooper in the U.S. Army. He has a bachelor's degree in MIS from the University at Buffalo and is working on his MBA from NC State University. In 2008, he was awarded the Network Professional Association (NPA) Professional Excellence and Innovation Award for his work on network architecture, templates and enterprise MPLS design.
Michael Morris's From the Field blog is also featured on the Cisco Learning Network. See it there, along with the blogs of other Cisco Experts.