A primer on closed-loop automation

The shortest path to achieving intent-based networking.

Network World - Insider Exclusive [Winter 2018] - Intent-Based Networking [IBN] - cover art
IDG / Thinkstock

Recently, I was reading a blog post by Ivan Pepelnjak on intent-based networking. He discusses that the definition of intent is "a usually clearly formulated or planned intention" and the word “intention” is defined as ’what one intends to do or bring about." I started to ponder over his submission that the definition is confusing as there are many variations.

To guide my understanding, I decided to delve deeper into the building blocks of intent-based networking, which led me to a variety of closed-loop automation solutions. After extensive research, my view is that closed-loop automation is a prerequisite for intent-based networking. Keeping in mind the current requirements, it’s a solution that the businesses can deploy. 

Now that I have examined different vendors, I would recommend gazing from a bird’s eye view, to make sure the solution overcomes today's business and technical challenges. The outputs should drive a future-proof solution.

Core business & technical challenges

Today's changing environment presents a number of business challenges. IT organizations want to stay agile, ensuring consistency of the infrastructure while retaining the ability to support brownfield and multi-vendor deployments.

IT organizations rarely depend on one vendor. Besides, due to the large investment made in existing infrastructures, few have the option to go with the greenfield deployment. The majority of networks consist of a number of domains; either branch, campus, or data center. Therefore, we have three places to store the data. Within each of these domains, the deployment of silo automation solutions surfaces many technical problems.

The explosion of technologies such as the Internet of Things (IoT), streaming media and 5G results in networking becoming a big data problem. There are so many small devices. All with different types create complexity in scale. Business problems are combined with a number of technical complexities. Although the business requirements define the urgency of the solution upon implementation, there will be technical challenges.

The variety and volume of data are changing across multiple devices. For example, a router may have NetFlow, sFlow records, and Simple Network Management Protocol (SNMP). The configuration data can be collected via NETCONF or Command Line Interface (CLI), all of which demand varying frequencies. 

Newer devices use model-driven streaming telemetry. Instead of polling, the devices push operational data based on either open standards or proprietary mechanisms. Without being locked to a specific vendor, administrators require extensibility along with the ability to customize the solution by themselves.

The administrators want to depend on the client’s own unique roadmap and add workflows as required. If not, they are essentially moving from device to automation vendor lock-in.

A perfect world: components of closed-loop automation

All the above-mentioned factors drive a type of closed-loop automation solution that can insulate you from these technical challenges. The ideal solution is based on an architecture that helps in terms of scale and resilience. In contrast, an outdated monolith closed-loop solution is hard to scale, hence stirring up challenges.

Each component in the architecture should have the capability to scale independently. For example, the service orchestration component should scale independently and horizontally from the service assurance component. They should have the capability to handle traffic spikes without the need to overprovision, which is costly and the planning could be time-consuming.

A future-proof system is responsive since it always monitors itself with the ability to heal and upgrade individual components independently. The ideal part is if one component fails then the rest of the system does not need to be changed.

A closed-loop solution is a combination of telemetry, analytics and orchestration. Automation helps to an extent and gets the job done quicker. It eliminates the "network cowboy" and streamlines configurations. The orchestration element ensures entire workflows are adopted.

However, what's missing is closing of the loop. The use of automation and orchestration form only a part of the picture. With these two alone, we need to constantly change gears, stopping and starting our efforts. By closing the loop, you ensure the system works as intended with full visibility that can be reported, optimized and remediated upon baseline deviations. The concepts are commonly referred to as intent-based networking (IBN).

Closed-loop automation & intent-based networking

If intent-based networking represents self-driving autonomous cars, the closed-loop automation represents the driver-assisted systems such as collision avoidance and lane change features. The closed-loop automation is the prerequisite to deploying intent-based network services. When combined with telemetry data, analytics and orchestration along with feedback loops enable intent-based network services.

Products such as Anuta Networks ATOM combine telemetry, analytics and orchestration elements to deliver closed-loop automation. Such automation satisfies the upcoming use case of network remediation based on excessive Border Gateway Protocol (BGP) neighbor flapping. It can also be used for bandwidth monitoring, traffic engineering, threat detection and predictive analysis to name a few.

With closed-loop automation, you are constantly monitoring the network and taking corrective actions to match the intent, which are the business goals. The solution validates the changes made to the network and keeps both configuration and operational data in-line with the baseline.

To help our understanding, let’s explore further with an example. The intent is in a stable network environment BGP neighbor flapping should not occur frequently. If it does occur, there is an issue with, for example, a circuit, hardware or device configuration. The resulting technical challenge is to have a low number of BGP neighbor flaps per week.

Example: BGP neighbor flapping

In order to get a baseline, some kind of collection device collects the telemetry streams, decodes, and stores in a database, preferably a time series database.  

Infrastructure devices send model-driven telemetry; administrators define the BGP parameters and manage the frequency levels. For example, configure sensor and frequency of BGP neighbors every 30 seconds and each device streams BGP neighbor information every 30 seconds.

The next stage is the correlation of data. A correlation engine examines the baseline to the current network behavior. If the data has a variation between the baselines, the predefined remediation tasks are carried out. Conditions are set depending on severity, for example, generated alarms create a trouble ticket for the operation’s intervention if there are neighbor flaps twice in one hour. A higher priority condition may state if the neighbor flaps five times in ten minutes, it generates an alarm along with a predefined remediation policy such as neighbor shutdown. Predefined conditions and actions are set and automated.

The entire data is collected and discovered from the network and analysis. Eventually, corrective actions are carried out based on the insights. Essentially, what's happening is that the network problem has turned into an infrastructure as a code problem.

An ideal closed-loop automation architecture

The ideal solution demands the ability to communicate with virtual or physical multi-vendor infrastructures for both brownfield and greenfield deployments in a variety of ways such as YANG, NETCONF, REST, gRPC, SNMP, etc.

These are just the ways of collecting configuration and operation data. The automation software must support both; legacy and new infrastructures. From the existing network infrastructure, you could have structured configuration such as NETCONF or unstructured configuration such as CLI. From the configuration, a data model is constructed that can then be applied to services by means of JSON, XML or YAML format.

Once in JSON format, administrators can easily manipulate the JSON objects by software and add, for example, one more Virtual IP address (VIP) to the load balancer. Besides, the administrator can automate the commands to the multi-vendor devices in the path. The manipulation of the JSON objects is translated into individual commands.

This removes the complexity of manually configuring the underlying devices. In the previous times, if you wanted to add VLAN 200 to all the switches, the administrator would need to create a syntax for each and every vendor. Contrarily, now you just need to update the key-value pair in the JSON object. It’s as easy as that. The underlying system can now translate by using an abstraction layer. This avoids the hard work of creating and pushing syntax. Today, we are dealing with the entire infrastructure with JSON objects and working at a much higher level than before.

The solution also calls for the ability to connect to a partner ecosystem, such as, SD-WAN Controllers, Path Computation Element Protocol (PECP), or NetFlow devices. More often than not, customers have already invested and have the best practice in place. Therefore, the automation system must interact, collect data and enforce policy through the partner ecosystem. It must blend with an ecosystem without much customization.

When you move to the cloud, the network functions also migrate to the cloud. How do you support the same level of policy in the cloud? To manage various network elements and support multi-cloud environments, the solution should have the capability to monitor the cloud state, provision the links, and provide reporting.

If the automation solution entails docker containers, then the cloud deployments are introduced with ease. Docker type formats can be deployed in any cloud. If it takes advantage of Kubernetes, it avoids many technology risks, as Kubernetes is already a proven orchestration system for Docker containers.

Case study: DDoS mitigation service

What type of product offerings might benefit from this type of closed-loop solution? For example, say you are offering DDoS mitigation service in the cloud.

When I think of a DDoS scrubbing center in the cloud, I can only assume it is packed with a variety of vendor equipment in strategically placed data centers. On each device, variables need to be automatically instantiated that cannot be preplanned.

When a customer is under DDoS attack, BGP Flowspec reroutes traffic from the customer site to the service provider’s cloud. In the cloud, traffic is scrubbed and sent back to the customer as clean traffic. If you do this manually, it is a slow and error-prone process with no consistency. Moreover, it can be difficult to manage and scale. How do you carry this out in a dynamic fashion? For the end-to-end provisioning of the DDoS mitigation solution, a single automation system can ensure extensible multi-vendor automation. It can ascertain the multi-tenancy, acting as the single source of truth that will place you in the best position.

With a self-service portal, provisioning of the end-to-end networking should be standardized and deployed in minutes, not days. Devices in the path are instantiated once the traffic is redirected along with all the virtual networks across the variety of multiple vendor products.

The device telemetry is used to ensure that the entire network resources are properly provisioned and the customer is receiving enough bandwidth to successfully mitigate the DDoS attack. This usage data will also help generate reports and in maintaining the billing transparency. All these tasks combined are essentially closing the loop.

Summary

Closed-loop automation consists of a number of phases such as data collection, persistence, correlation engine and remediation. To achieve closed-loop automation status, the automation solution must go through these phases. The end result can be defined as anything from intent-based to auto-healing networks.

By closing the loop, you increase the speed and agility, simplify troubleshooting and eliminate the common manual errors. This will reduce downtime to an absolute minimum while enhancing network capabilities to focus on optimizations instead of simple fault identification.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Now read: Getting grounded in IoT