Five enterprise network operations managers told me they were very concerned about recent cloud outages. Why? Because every one of the outages were caused by network problems. Four of the five managers admitted that in their own containerized data centers, their problems came more often from networks than from servers. Why is this happening? Answer, according to enterprises: more isn\u2019t better.\nComplexity is the enemy of efficient operations and management. The sheer volume of things going on can swamp management centers and even management tools. If you add in multiple vendors and multiple technologies that create differences in operations practices, you get something very messy. But it\u2019s more than just size or technology scope that\u2019s making network operations complicated, it\u2019s the way networks work.\n\nIP networks were designed to be adaptive. Every router is an island that shouts out its identity and status over whatever trunks are available, and every router listens to other shouts. From all this shouting, the routers collect the state and topology of the network, and from that information they build routing tables. Periodic shouts keep the tables up to date...sort of.\nIf Router A shouts out on a trunk to Router B, that router has to repeat its state\/topology advertising to its adjacent routers. Any change in conditions anywhere, including a seemingly minor change in configuration, has to be propagated through relayed shouts. That takes time, a period called \u201cconvergence\u201d where all our routers are singing (or shouting) the same tune. During that convergence, packets can be delayed or even lost, but that\u2019s not the real problem. It\u2019s hard to engineer how you want the network to operate, both under normal and failure-mode conditions when everything depends on shout relays. MPLS can help here (traffic engineering is why it was invented), but the real answer may be software-defined networks, or SDN. Unfortunately, we\u2019ve messed SDN up.\nThe SDN upside: Control\nIf you ask a hundred network operations people what SDN is about, they\u2019ll tell you it\u2019s to \u201cseparate the control plane and the data plane\u201d. \u00a0If there\u2019s ever been a more useless definition of a network concept, it doesn\u2019t come to mind. What the heck does that mean, and why would you care? What SDN is really about is substituting planned route development for adaptive development. The control-plane stuff comes in only because all that router shouting takes place not with data packets but with control packets.\nIn SDN networks, a controller maintains the topology of the network. That doesn\u2019t take much effort because, after all, routers and trunks don\u2019t float around much. The controller keeps up to date on the state of the network elements, and it has policies that decide how routers and trunks should be stitched together to create routes. It sends the routing tables to the \u201crouters\u201d (in the Open Network Foundation model, using the OpenFlow protocol).\u00a0 It\u2019s like an old-fashioned Mother May I? game; everyone does what they\u2019re told.\nSDN networks have absolute control over routes and route changes. They allow operations to engineer alternative routing topologies based on failure analysis or can calculate them based on policies, but either way, every device gets the same tune shouted from that central controller. No convergence period with a bunch of inconsistent routes, no confusion. Because you can examine these alternative failure-mode topologies carefully before you commit them, there are no surprises.\nIf we asked our hundred netops types to line up on the left if they loved SDN and on the right if they didn\u2019t, most wouldn\u2019t know where to go; that\u2019s been true for the decade-and-a-half that SDN has been around.\u00a0 But some people did know where they lined up on the issue. Almost as soon as SDN was suggested Google started looking at adopting SDN in its backbone network, and it did just that. How, given that Google has to interwork with the internet for everything it does?\u00a0 It surrounded its SDN core with a series of \u201cBGP processes\u201d that made its SDN network look like an IP network.\u00a0 SDN in a BGP-paper-bag, or in modern terms, a black box or SDN-based \u201cIP-intent model\u201d.\nSDN in general, and the Google example in particular, illustrate two ways of reducing network complexity.\u00a0 First, rely on deterministic behavior to control routes. In an adaptive network, you don\u2019t really know what all that shouting is going to converge on until it\u2019s done. In SDN, one caller is calling the square dance\u2014the SDN controller.\u00a0 Second, a hierarchical structure can reduce complexity in itself.\u00a0 If the internet, which contains hundreds of thousands of routers, were a flat network linked throughout by our router shouts, nothing would ever get through. Instead, it\u2019s broken up into segments (autonomous systems or ASs) that first route between segments and second within them, with the latter process being based on the shouting routers.\nSince most enterprises use IP VPNs for the WAN, their network-building is likely focused on the data center and the connection between the data center and the VPN. This is a great spot for SDN, because data-center configuration is critical and because there\u2019s little chance that a network failure would completely cut off the SDN controller from some devices, which would make controlling the network difficult.\u00a0 A company with multiple data centers can create an SDN segment for each and link them via SDN or via traditional routing.\nAdopting SDN might not be expensive\nYour company probably doesn\u2019t use SDN at all, but before you join other netops people bending an elbow and singing of lost opportunities, take heart.\u00a0 You can do all this today, with products already available and in many cases installed.\nMost router and switch vendors support SDN on their devices; look for OpenFlow as a supported protocol and review exactly what features are offered. SDN controllers are also readily available from the major router vendors and a dozen other sources. Before you make a commitment, be sure that you understand how to use a given controller\u2019s features to create the policies that define both the \u201cnormal\u201d state of routes and any failure modes you need. Also be sure that your controller supports full journaling of all activity, both network status changes and changes made by administrators to the controllers\u2019 operating parameters.\nWhy all the concern about policies and journals? Because of SDN\u2019s dark side. You can make the network do exactly what you ask, and that means that you can make it do truly bad things. SDN doesn\u2019t eliminate network errors, it just lets you insert more explicit planning and control into the configuration process. If your SDN controller doesn\u2019t protect you from yourself, doesn\u2019t let you validate routes, and establish policies, and consult journals when something goes wrong, it can take you to a very dark place.\nSDN is a way for you to truly control your network, not just watch it while it tries to control itself. If that sounds good, it\u2019s time to revisit the concept. Just be aware that human error is still the biggest source of network problems, and make sure your humans and your SDN are coexisting and cooperating.