This year's VMWorld kicks off this week in San Francisco and I’m expecting the typical huge audience. One of the reasons why VMWorld has become the premier show that is has is because virtualization itself is changing. Virtualization used to be a tactical technology used to consolidate servers. Have 10 servers? Consolidate that down to one or two. The architecture fundamentally stays the same, just fewer physical boxes.
Virtualization today though is a much more strategic technology. It powers the cloud, desktops are being virtualized, storage is being virtualized and data center operations are being automated. However, being more aggressive with virtualization does bring some new risks to enterprise IT. To understand what some of these are, ZK Research and Xangati, a cloud and management solutions provider, recently ran a survey looking at where companies were with cloud deployments, where they were going and what the challenges were for future deployments.
One of the more notable set of data points revolved around performance storms and the respondents ability to deal with them. We asked the question “What do you perceive to be the key characteristics of the performance storms in your virtualization/cloud infrastructure?” Almost 30% of the respondents stated they were the hardest problems to track down and resolve. Another 20% claimed performance storms were transient in nature and 18.8% claimed they fly under the radar of current monitoring solutions. So a large number of companies have unpredictable performance storms that are transient and can’t be seen by current management tools. Hmm, seems like a problem.
A follow up question was asked regarding how long it takes to identify performance storms (not solve, just identify) and only 79% of respondents stated more than an hour, with over 34% stating it can take more than four hours. Understanding that an hour of downtime can cost companies millions of dollars, this is a problem that needs a better solution.
The other significant impact of poor performance was related to rolling back virtualization. A true false questions was asked “Our virtualization team has been forced to roll back the virtualization of at least one mission critical application due to performance storms”. A shade over 25% of the respondents replied this was true. So one out of four companies had to go backwards with virtualization because of performance issues.
Is the problem getting better or worse? The survey asked the question “With an increase in your percentage of servers virtualized there has been an increase in the time it takes to identify and resolve performance issues.” 62.2% of the respondents agreed with this, meaning more virtualization leads to more problems and more virtualization is where we are headed.
Lastly, we asked whether respondents expected VMware vSphere/vCenter to provide more automated remediation of performance issues and 74% either agreed or strongly agreed with that. So as great a vendor as VMware has been, the company clearly isn’t giving its customers the tools to isolate and solve performance issues.
The data is clear, performance storms are happening, they’re hard to find and they’re costing the company because they take so long to isolate and respondents aren’t getting help from VMware. Unfortunately, tools like HP Openview and CA Unicenter won’t solve performance issues either. These legacy tools are long past their prime and meant more for the days of static IT instead of real time, virtual IT.
Solving this problem requires looking to the world of smaller management companies that are built for virtual environments and performance. Xangati, who I ran this survey with is an excellent example of a vendor that provides cross IT performance management software that can focus in on problem isolation. Splunk, Gigamon, Cascade (Riverbed) and Netscout are also examples of management tools built with an understanding of how to isolate and mitigate against performance storms, albeit from different points of view.