Data center automation - a ‘no-brainer’

* An analogy illustrates automation’s worth to the data center

Information-technology complexity is often cited as a major concern for IT executives. In response to the challenge of managing increasingly complex data centers, many vendors are talking about “self-healing,” “autonomous” and “adaptive” data centers. Regardless of which marketing term is used, the concept is to design and build data centers that require less management, maintenance and troubleshooting. While a truly autonomous data center is at least a few years away, there are some important design principles that can help us move in the right direction.

A top-down approach to the self-healing data center involves mapping the relationship between business processes and the systems and network elements that run them. In this scenario, the high-level workflows and applications are monitored for performance and any problems are isolated and repaired in a top-down fashion.

Another approach is to design each element in the data center as a self-healing element with built-in redundancy. In this model, a local failure will trigger a local response; for example, a failed server is automatically dropped from a cluster, while the application experiences no interruption.

So which is the best approach? Should self-healing data centers be designed with top-down control or bottom-up with redundant and self-healing components? As with many such debates, the answer is probably both.

We can take a look at how nature solves similar problems. Our brains are highly adaptive and self-healing and can rearrange their wiring in response to trauma. But many of the survival processes going on in our brains are completely unconscious. I do not need to remember to digest, nor can I forget to breathe. While I am aware of these activities, I do not control them directly. At a cellular level the brain loses and replaces neurons all the time without any obvious loss of function.

In a self-healing data center, we must take a similar approach. Redundancy and self-healing features should be incorporated into each layer and element in the data center. A top-down management system should not and cannot concern itself with every glitch and failure occurring in the data center - those failures should be resolved autonomously at the local level. At the same time, however, the management system does need to monitor the high-level business services and applications and ensure they are running smoothly.

As with our brains, it is not necessary or even feasible to control every tiny detail from the top. Our conscious thoughts “orchestrate” actions such as walking and chewing gum, but can only do so because they are not overwhelmed by implementation details. When we chew, we do not consciously adjust the level of saliva or the digestive process. We just provide the high-level direction and our brainstem fills in the details.

In our self-healing data center we need to take a similar approach. Top-down management should concern broad performance considerations, not detailed server configurations. Once the high-level requirements are specified, a virtual and self-healing compute/storage cluster should execute our application and ensure it is always “healthy.” Any deviations from the performance requirements should be resolved locally.

Many vendors have attempted to build complex top-down management systems that attempt to coordinate thousands of resources in the data center. But just as you wouldn’t be able to manage even the simplest task if you had to concentrate on keeping your heart beating, your data center management system cannot manage the enormous complexity of the data center without delegating some of the low-level activities. Data center managers attempting to design self-healing data centers should try to build redundancy and autonomy into every layer of the data center, leaving their management systems free to concentrate on the task of orchestrating business processes.


Copyright © 2005 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022