My website is up, but is my content reachable?

The metrics you need to help you find the answer.

analytics stats statistics big data managment manage
Credit: Thinkstock

I recently read an article in InfoWorld by David Linthicum entitled, "Good Cloud Ops Need Good Cloud Metrics." It caught my attention for two reasons.

First, Linthicum is correct in his assessment that metrics are as important in cloud operations as they are in any other aspect of your operations. To paraphrase Lord Kelvin, "to measure is to know." And knowing lets you do something about it.

But second, it caught my attention because while the metrics he describes are important, they are only part of what you need to measure to fully understand your Internet Performance. Linthicum focuses on measuring the performance of the systems and applications within the cloud provider. But as I've discussed earlier in this blog, system and application performance is only part of the story—network performance across the Internet between the cloud and end users is a huge factor, too. Because, of course, those end users are your customers.

Another way to look at it is that tracking metrics within the cloud will give you valuable insight to help keep your applications up and running, but the most important point is not whether your website or online content is up and running. Instead, what's most important is whether your website or content is reachable (i.e. can a customer connect via her local Internet service provider?) Optimizing the customer experience can be challenging, since you can't control whatever outages or downtime the customer's ISP or other ISPs in their path are experiencing.

Measuring performance across the network, however, is the first step to help mitigate issues for your customers. With that in mind, let's take a look at Linthicum's three points about cloud metrics:

1. "You can trend data and spot issues with recent operations."

You need to measure network performance continuously to understand the baseline and detect deviations, and there are different mechanisms for doing so. One is Real User Monitoring, or RUM, where you instrument your web pages to send code to run in the user's own browser to take performance measurements (there's an entire Javascript API for this purpose). You can also infer user performance from "synthetic measurements." For example, if you know performance from a given network provider in a certain city to a particular cloud provider's data center is degraded, and you host content in that cloud provider's location, you can reasonably assume performance for users of that provider in that city will suffer. You can set up and run your own RUM or synthetic measurements or use a vendor who specializes in such measurement. And don't limit yourself to cloud providers: you should be measuring performance to anywhere you have content, such as CDNs and hosted data centers.

2. "You can use the data to provide predictive analytics."

Once you have enough network performance data to the places where your content and applications live, you can explore to find patterns and make predictions. Does performance change on a regular basis? You might be surprised at how often this is true. Many systems used by humans follow a diurnal pattern (showing the same changes every day). For example, the end of the business day on the U.S. East Coast is evening in Europe and early morning when people are waking up in Asia, and this busy time can affect traffic and change performance.

3. "You can make your systems in the clouds self-healing."

So once you have the network performance data and are finding patterns in the data, what can you do? There are multiple ways to take action. Since we're talking about network performance, an obvious action is shifting traffic from a poorly performing path to a better one when you detect a degradation or outage. Having worked for a long time with and around DNS, I need to point out that using DNS answers "steered" by performance data is a simple and effective way to route your users to the best-performing path.

So start measuring—not just within the cloud, but the entire network path starting with the users. Once you have the data, you know. And once you know, you can take action.

This article is published as part of the IDG Contributor Network. Want to Join?

Must read: Hidden Cause of Slow Internet and how to fix it
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies