According to some vendors, analysts, pundits and sponsored spokespeople, OpenFlow and SDNs in general are the second coming. According to the same crowd, their attributes include:
- Lowering network costs by orders of magnitude
- Breaking the hegemony in networking by monopolistic vendors
- Enabling true virtualization with full workload mobility across geographic boundaries
- Simplifying topologies and modernizing segmentation
- Providing extensible, machine-readable APIs
- Letting you write your own protocols
- Flow-based routing with much better link utilization by dynamically routing 'elephant flows'
The list goes on ... and it slices bread better than a Ginsu promoted on a late-night infomercial as well.
I learned a lesson a long time ago, back in the days of 'Content Networking,' about believing your own hype, trusting your own PR, and not taking a reasonably pragmatic view of new technologies. I thought DNS tricks would become a staple of network architectures when coupled with DNS Anycasting and HTTP-based flow inspection [I even got some pretty worthless patents from it that no one should EVER implement!] I remember when 'Application Oriented Networking' was going to revolutionize the way we built networks and networks would have true application layer intelligence and message routing integrated - [yeah that happened]. I remember 'Identity-based Networking,' when every device would swiftly adopt 802.1x and policy would be bound to your 802.1x authentication and the network would dynamically adapt to your user profile - [that was a winner, we band-aid'd that up with NAC and created a market for yet another in-band appliance]. 'Workgroup Computing' gave us the gem of campus-wide Spanning Tree because, of course, workgroups needed to be on the same addressing structure, right? [Brilliance ... caused the worst network designs I've ever seen - glad Geoff Haviland's paper started righting those wrongs back in '98.]
So every few years the networking industry invents some new term whose primary purpose is to galvanize the thinking of IT purchasers, give them a new rallying cry to generate budget, hopefully drive some refresh of the installed base so us vendor folks can make our quarter bookings targets. Sometimes these 'Networking Memes' have value, sometimes not: the current two in flight are 'Fabric' and 'Software Defined Networking/OpenFlow.' (SDN/OpenFlow is a Trees/Apples statement - all apples grow on trees, not all trees grow apples. OpenFlow is a form of SDN, but OpenFlow is not the only form of SDN - it is just the most spoken about, asked about, and the current belle of the ball with a pretty full dance card, so it's the one I'll use for the rest of this post.) SDN, in a broad sense, is any mechanism to enable the use of software-programmable APIs to control functions that exist primarily in just ferreted network software systems, such as topology construction, address learning and distribution, access control and security, etc.
I'm not a nay-sayer about new technologies. Those who know me well know I can get wrapped up in the next big thing faster than Dug the Dog from Pixar’s ‘Up’ can point out a 'Squirrel!!' But every once in a while it does help to pull yourself back from the precipice, look around, and ask what problem can the technology really solve as implemented today? What does the current roadmap look like and what new problems will it let us solve? Are they worth solving? Is the current trajectory we are on the one that will have the best impact for customers tactically and strategically?
Sometimes things pass the test, and should continue to be invested in. Sometimes they don't. I don't think I've passed personal judgment on SDN/OpenFlow at this point, but am taking pause for a second to look at it and wanted to share my own observations.
Use Case is one important factor to take into account. The current best-use case I can find for OpenFlow is people who are applying for GENI research funding to do OpenFlow research, who need it as a formal requirement for funding their projects using OpenFlow. [Yes, this sounds like speed-dating at a family reunion.]
The most common non-academic/recreational use case I see is to identify some enormous feed of snooped data that is deemed interesting across some reasonably large number of switches, then copy or redirect that flow through a probe for further analysis. This cuts down on the requirement for expensive probes/sniffers at the edge, and lets you manage the cost of this type of deployment in a rather nice way. It also sets up the OpenFlow controller as a form of Policy Distribution Point and makes every enabled/controlled switch a Policy Enforcement Point inasmuch as identification of an interesting flow and redirecting or copying it is a form of Policy Enforcement. [This is a heckuva lot better than trying to man-handle this with SPAN, ACLs, VLAN Flooding, GRE Encaps and PBR and every other way we traditional network CLI jocks would profess to be able to support this]. If I could look at some traffic, click on a flow that looks nifty, and then through some bit of software sorcery that appears on my sniffer across three to five network hops without flooding stuff everywhere, well I for one would be tickled pink.
I have heard of other use cases, but mostly these other use cases require some form of vendor extensions to the current OpenFlow 1.0/1.1 specification (and may or may not be addressed in the pending 1.2 specification). These consist of:
- Building a large and flat network and letting the OpenFlow controller setup the topology and control all MAC and ARP learning throughout the network.
- More efficient load balancing and sharing across link aggregation groups by identifying larger 'elephant flows' and then moving those to paths that do not compete with other latency sensitive traffic.
- Enabling a centralized control plane for all MAC and IP learning, and enabling distributed segmentation controls without affecting the underlying topology through interesting applications of MAC learning, ARP table responses, and GRE tunneling.
Some challenges I see to the proposed use cases above (but not to the distributed sniffer one which seems pretty sound to me) are:
1) One vendor recently announced OpenFlow support for their switches, but without any difference in cost. Yes, we need a simple reality check here: OpenFlow switches do not cost less than non-OpenFlow switches. The hardware cost of goods is the same. You still need a control plane CPU, some DRAM and NVRAM, a PCB to stuff them onto, along with your choice of switching silicon, PHYs, power supplies, etc. So an OpenFlow network will not cost you less than a traditional network from a CAPEX perspective.
2) Another vendor recently announced that its Openflow switches were tested to somewhere around 80 to 100k flows. This may sound like a big number to you, and it’s actually not bad for many implementations in a very structured environment. But let’s look at this reality: a modular switch today can forward over FIVE BILLION Packets per Second, [yes Billion with a big capital B]. A single 10GbE port hosting a virtualized server running VDI can take 1000 to 2000 flows, a single web server can be over 10k active, a single server load balancer is a million or so flows, and the same with 10GbE firewalls. Flash-based SSDs are quickly overcoming the IOPS choke point we dealt with that regulated most data center network performance to a few hundred/thousand flows per host and quickly pushing us past NxGbE on the hosts. Couple that with the Intel Romley chipset, and, well, you have the recipe for melt down and immediate lack of flow tables. Vendor-specific behaviors when the flow cache is full may vary, but the most common are flood packets in a biblical fashion, or forward in the control plane [yes, the 1-2Gb at best control plane on the 1.2 Terabit/second ToR switch, not pretty].
3) Another vendor told me its flow setup rate across a two-to-three-tier network was now down to 250msec to program the forwarding tables across the network, which doesn't sound too bad. But in that quarter of a second when you receive the first packet (let's say it’s a UDP flow at 5Gbps on a 10Gb link) you will be receiving at 625MB/sec/port and you will have to buffer 156.25 MB of data. Buffers tend to cost a lot, which is why we don't see large buffers on cost-optimized 10Gb ToR switches from most vendors (2MB to 9MB is kind of the norm right now, one or two go a bit deeper but are also priced accordingly). In fact, most modular switches designed for the backbone/spine don't have the buffer capacity to support holding this much data either.
A ToR switch with 64 ports of 10GbE has 9MB of buffer: If 10% of the ports have a flow starting at 50% of wire speed it will generate 937MB of data in the first 250ms. The switch will drop 99.04% of the data before the flow is setup. Once the flows are setup they will forward, properly assuming the flow table on the switches can handle the aggregate number of flows visible in the infrastructure. See point 2) above and then ask yourself: “what happened to my edge switches when SQL Slammer hit?” Yes, I am again dating myself a bit but SQL slammer filled the flow caches up on most vendors’ switches, causing them to then punt all traffic to the control plane, which then spiked and caused the switch to crash. This cascaded throughout the network. At the time, the Cisco Catalyst 6500 Supervisor was one of the few devices to have an m-trie topology derived cache that was able to forward without consuming a flow table. It stayed up because it was NOT doing flow tracking. Switches with 512k flow tables and 1M flow tables melted down. (Note: most vendors switches today do use TCAMs and topology derived caching systems where the TCAM is pre-populated with the forwarding information based on the routing table, connected routes, and IP hosts table learned - they use this for a reason which is simply Deterministic Performance Under Load.)
4) The concept of flattening the network out so that you can put any workload on any server anywhere is visually appealing, especially to server and virtualization administrators. Today, they are bound into force-fitting workloads into very static and inflexible IP addressing architectures as dictated to them by the networking team. And yes, I am being a bit pithy here, but both sides have a point - we networking folks have not exactly been the best at automating our kit, making programmatic interfaces that enable machine-to-machine interfacing, and stepping away from our beloved 80x24 character terminal with VT100 Emulation. We like our status as the medicine man working on that weird black-box called the network that does some crazy voodoo stuff but keeps everything talking. At the same time, we’ve also spent the last 15 years trying to get the budget to re-architect our network so that it’s finally stable and does not crash constantly. And as soon as we have a second to breathe, some upstart virtualization guy comes in and says we need to go back to the drawing board because he wants to move VMs all over creation.
This problem is getting attacked from several angles - rather than passing judgment on which is the best, because that would be as arbitrary as trying to pick next year's Super Bowl champion, right now let’s just list some of the contenders:
- OTV : Overlay Transport Virtualization
- LISP: Location/Identification Separation Protocol
- VXLAN: Virtual eXtensible LAN,
- NVGRE: Network Virtualization GRE
- OpenFlow with Vendor Supported Attributes and Tunneling (usually GRE or GRE derivative),
- H-VPLS, etc.
Most of these are not designed in any way to work together, often provide 'yet another tagging format' for you to sift through, and may introduce another single point of failure in your infrastructure. Time will tell. I bet in the end those that are the easiest to setup and require the least churn to your infrastructure (both on the VM/Server side and on the network side) will win, but that’s just because history kind of says that.
5) The final challenge is the hype-factory itself. Take control of your network: unparalleled flexibility, provision paths for elephant flows, nail down routes and bridging paths to avoid instability of the standard L2/3 control plane, free yourself from vendor stranglehold, exploit the cost benefits of commodity switches, manage your whole network from a single point, all with a 10-tuple matching engine that supports maybe 4000 flows and a flow setup rate of maybe a few thousand per second in your 10-terabit data center environment with millions of flows, which already doesn't add up. But wait till I tell you that these flows are programmed via a remote controller, which does not yet actually exist, which connects to the switch either in-band in the network being controlled, which would obviously never work, or through a whole separate management network, which becomes a potential source of outages on a scale you have never seen before, as if one misbehaving switch in your control network can take out dozens or hundreds of production switches. ’Nuff said...