So you want a private/hybrid cloud strategy for your company. Great idea. Now, how do you ensure that your cloud strategy will be successful? Well, this requires understanding how to measure success. Let me suggest the following criteria:
- Your constituents (generally the business and applications owners) can voluntarily decide to adopt your cloud or not.
- You need to be able to prove to them that it is to their advantage to run their applications in your cloud.
Now, to do the above, you need to be able to provide cloud operational metrics that mirror what the business uses. Let’s review what the business uses as metrics:
- Revenue and growth in revenue
- Costs and projected growth in costs
- Profits (revenues minus costs)
- Market share and growth in market share
- Growth in new target markets
- Return on invested capital
The business has an excellent set of metrics by which to judge the performance of the business. These are, of course, all financial in nature. But the key attribute of these metrics, in addition to their being financial, is that they measure the performance of the business. So, we need to borrow this focus upon performance from the business and create a focus upon performance for your cloud.
How to measure the performance of your private/hybrid cloud
The first step here is to put yourself into the shoes of your customer. What does the customer of your cloud service want? Step back and remember why you are building a private/hybrid cloud in the first place. You are most likely doing this because you know that you and your application owners are not going to get the reliability and performance you need from a shared tenant public cloud. So, your constituents are likely going to want the following from you:
- Guarantees as to availability and metrics to measure and prove availability
- Guarantees as to performance and metrics to prove performance
- Guarantees as to throughput and metrics to prove throughput
Of the above three, availability is the easiest to understand and measure. It is generally a simple matter of setting up synthetic transactions against the web server for the application and measuring the percentage of the time that the transactions complete successfully. This is also an extremely useful way to know if things are working before the users of the application show up for a day of work.
It is when we get to performance and throughput that things get tricky. Generations of systems administrators have taken the approach that performance and throughput can be measured by looking at resource utilization, specifically CPU utilization, memory utilization, network utilization and storage utilization (IOPS). The problem is that in a dynamic and virtualized system, resource utilization is no longer an accurate proxy for performance.
In these environments, it becomes necessary to use new definitions:
- Performance: For a private/hybrid cloud, the definition of performance should be transaction response time for the transactions and applications of interest. For all of the layers of the infrastructure that support the applications and transactions, the definition of performance should be the latency (the cousin of response time) of each layer of the infrastructure. In particular, this means latency of the network and storage components that support each transaction and application.
- Throughput: For a private/hybrid cloud, the definition of throughout should be the amount of work done per unit of time. For transactions, this can be calls per second. For the network, this can be packets or bytes per second. For the storage layer, it is usually I/O Operations per Second (IOPS)
An instrumentation architecture for your private/hybrid cloud
To be able to collect the required performance and throughput metrics at each layer of your stack, you need an instrumentation architecture. You need to enumerate each layer of the stack and the components of each layer, then determine how you are going to get the required performance (response time and latency) and throughput metrics from each component at each layer. This is shown in the image below.
Once you get your crucial source of metrics identified, then the really hard work will start. It will not be enough to collect all of these metrics and put them into a big data back end and leave it to the users of the metrics to determine in their queries which components of your infrastructure support which transaction at each point in time. To have an effective cloud instrumentation strategy, you not only need the performance and throughput metrics at each layer of the stack, but you also need to know which virtual and physical elements of your infrastructure support each transaction over time. This is depicted below.
The relationship map for your private/hybrid cloud
An effective instrumentation strategy for your private/hybrid cloud requires collecting performance and throughput metrics at every layer of the stack—from the transaction to the spinning disks—and relating those metrics into topologies that capture the supporting virtual and physical infrastructure for each transaction and each application at each point in time.
This article is published as part of the IDG Contributor Network. Want to Join?