Isolating the exact cause of a problem is often the biggest challenge in a virtual environment, and it is not enough to simply monitor the hypervisor. You have to take a more holistic view.
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
According to a recent survey, the fastest growing segment of applications being virtualized is the core business segment, which includes everything from SAP tools to databases and CRM. These are the applications that drive a company’s business, which makes performance management of virtualized infrastructures more important than ever.
Up until now the focus of VMware performance management has been on collecting metrics regarding the hypervisor and virtual machines. As virtualization technology has matured, the hypervisor has become a commodity and the focus is now moving to higher layers of the stack. Also, administrators are finding that it is most often not the hypervisor that fails. Problems can originate anywhere in the infrastructure – from the desktop to the network, the storage, applications, databases, etc.
Isolating the exact cause of a problem and fixing it quickly to reduce business impact is often the biggest challenge. Therefore, it is not sufficient to monitor the hypervisor alone. You have to take a more holistic view of performance and look end-to-end, across every layer and every tier of infrastructure.
For example, performance problems may be caused in the server operating system (say, not having the right hotfixes) or stem from problems with the applications. By not monitoring these layers, an organization is exposed to performance problems. The hypervisor can tell you which virtual machine (VM) is taking up resources, but it cannot tell you why resources are being used.
It is also necessary to monitor configuration. From a performance monitoring perspective, it is important to know what configuration changes have happened as well as understand how they impact performance. Having the ability to go deep and provide detailed diagnosis is often hugely helpful in enabling a quick fix. As with performance monitoring, configuration monitoring should extend beyond the hypervisor, to the VMs and the applications running on them.
Dependencies are also significant as today’s IT infrastructure is very interdependent. In fact, problem diagnosis is difficult because the effects of a problem in one tier can show up in a different tier. In order really determine the cause of problem an you need to understand and use dependency information.
Virtualization adds to the mix of dependencies. For example, you may have VMs running on an Oracle server, a Citrix server and a media server. They are all sharing resources of the physical server. If there are many video requests to the media server, this may cause a lot of IOPS to the physical server’s data store and logical unit number. Over a period of time, the physical server may choke and in turn slow down the performance of the Oracle database server. Because they share resources of the physical server, one VM taking an excessive share of resources impacts the performance of others. To properly diagnose problems, VMware performance management systems must take these dependencies into account.
Further complicating the situation is the fact that dependencies are dynamic in a virtual infrastructure. With vMotion, a VM and the applications running on it can move dynamically from one physical machine to another. So in one instance applications one and two are contending for resources, while in another, applications two, three and four are sharing resources. Virtualization performance management systems have to be able to handle such dynamic dependencies.
In conclusion, the process of problem solving using a fragmented set of monitoring tools is slow and manual and at times frustrating. Problem solving requires expertise because the dots have to be connected manually. You receive a great deal of metrics on a daily basis and time has to be spent determining what the metrics mean.
In a dynamic virtual infrastructure where dependencies change in real time, a slow and manual process isn’t able to handle the variety of considerations needed to really understand root cause problems. To really do justice to performance management monitoring IT needs to move away from merely collecting metrics for manual interpretation. Virtualization dynamics and dependencies can cause significant performance and user experience issues that diminish the benefits of virtualization and risk disruption of critical business processes.
Ramanathan is CEO and founder of eG Innovations, a provider of automated performance monitoring and management solutions for virtual, cloud and physical IT infrastructures.