• United States

Network management must evolve in order to scale container deployments

Dec 21, 20185 mins
Data CenterNetwork Management SoftwareNetworking

Containers are coming, and network professionals need to switch to telemetry-based management tools.

Applications used to be vertically integrated, monolithic software. Today, that’s changed, as modern applications are composed of separate micro-services that can be quickly brought together and delivered as a single experience. Containers allow for these app components to be spun up significantly faster and run for a shorter period of time providing the ultimate in application agility.  

The use of containers continues to grow. A recent survey from ZK Research found that 64 percent of companies already use containers, with 24 percent planning to adopt them by the end of 2020. (Note: I am an employee of ZK Research.) This trend will cause problems for network professionals if the approach to management does not change.

In a containerized environment, the network plays a crucial role in ensuring the various micro-services can connect to one another. When connectivity problems happen, data or specific services can’t be reached, which causes applications to perform poorly or be unavailable. As the use of containers continues to grow, how networks are managed needs to be modernized to ensure the initiatives companies are looking to achieve with them are met.

Legacy network management methods no longer sufficient

The current model for managing networks, which has been in place for decades, uses sampled data from protocols such as Simple Network Management Protocol (SNMP) and NetFlow. That means data is collected periodically, such as every 30 seconds or every minute, instead of being updated in real time.

Sampled data is fine for looking at long-term trends, such as aggregate data usage for capacity planning, but it’s useless as a troubleshooting tool in highly containerized environments because events that happen between the sampling periods can often be missed. That isn’t an issue with physical servers or virtual machines (VMs), as these tend to have lifespans that are longer than the sampling intervals, but containers can often be spun up and then back down in just a few seconds. Also, VMs move and can be traced, whereas containers are deprecated and can be invisible to traditional management systems.

Container sprawl an emerging problem

Also, highly containerized environments are subject to something called “container sprawl.” Unlike VMs, which can take hours to boot, containers can be spun up almost instantly and then run for a very short period of time. This increases the risk of container sprawl, where containers can be created by almost anyone at any time without the involvement of a centralized administrator.

Also, IT organizations typically run about eight to 10 VMs per physical server but about 25 containers per server, so it’s easy to see how fast container sprawl can occur.

A new approach to managing the network is required — one that can provide end-to-end, real-time intelligence from the host to the switch. Only then will businesses be able to scale their container environments without the risk associated with container sprawl. Network management tools need to adapt and provide visibility into every trace and hop in the container journey instead of being device centric. Traditional management tools have a good understanding of the state of a switch or a router, but management tools need to see every port, VM, host, and switch to be aligned with the way containers operate.

Real-time telemetry mandatory in containerized data centers

To accomplish this, network management tools must evolve and provide granular visibility across network fabrics, as well as insight from the network to the container and everything in between. Instead of using unreliable sampled data, container management requires the use of real-time telemetry that can provide end-to-end visibility and be able to trace the traffic from container to container.

The term “telemetry” is very broad, and some vendors use it to describe legacy data models such as SNMP. Most of these older methods use a pull model where data must be requested. Real telemetry is a push model, similar to flow data. Telemetry also allows for the monitoring of devices with no impact to system performance. This is a huge improvement over older protocols that would often degrade the performance of the network device, which is why it was often turned off or the sampling intervals increased to a point where the data was no longer useful.

Because the telemetry information shows the state of every container and every interface, it provides the insight to understand the relationship between the containers. Management tools that use telemetry provide network and application engineers the visibility they need to design, update, manage, and troubleshoot container networks. It’s important that the management tool be “closed loop” in nature, so it continually improves the accuracy of the product.

Telemetry shines a light on current blind spots

A benefit of telemetry is an ability to trace or see containers across the network to quickly identify blind spots that can’t be seen with legacy tools that rely on sampled data. This is particularly beneficial when dealing with chronic problems that have short lifespans but occur frequently. Without real-time visibility, the problem can often disappear by the time network operations begins the troubleshooting process. Real-time tracing can be used to look for anomalies and enable network professionals to fix problems before they impact the business. 

Containers are coming and coming fast, and legacy network management tools are no longer sufficient because they provide limited data, which leads to incomplete analysis and many blinds spots. The inability to correlate container problems with network issues will lead to application outages or poor performance, which can have a direct impact on user productivity and customer experience, impacting the top and bottom line. 

Network professionals should transition away from older network management systems to those that use telemetry, as this will enable engineers to see every container, the underlying infrastructure, and the relationship among the various elements.  


Zeus Kerravala is the founder and principal analyst with ZK Research, and provides a mix of tactical advice to help his clients in the current business climate and long-term strategic advice. Kerravala provides research and advice to end-user IT and network managers, vendors of IT hardware, software and services and the financial community looking to invest in the companies that he covers.

More from this author