When I was young, I made three plastic models. One was of a car—a '57 Chevy. Another was of a plane—a Spitfire. And a third was of the Darth Vader TIE Fighter. I was so proud of them. Each one was just like the real thing. The wheels turned on the car, and the plane’s propeller moved when you blew on it. And of course, the TIE Fighter had Darth Vader inside.
When I went to work on the internet, I had to measure things. As I discussed in my last post, Measure cloud performance like a customer, when you measure on the internet you need to measure in ways that are representative of your customers’ experiences. This affects how you measure in two ways. The first is the perspective you take when measuring, which I talked about last time. The second way is the techniques you use to perform those measurements. And those techniques are, in effect, how you make a model of what you want to know. Those childhood plastic models turn out to offer some solid guidance after all.
One technique that’s needed for measurement is something very like the the actual network conditions one wants to check. So, for example, if you want to verify that your cloud provider’s authoritative DNS is working, you must measure as though you are an ISP’s resolver. If you are trying to make sure your cloud deployment is serving your web content correctly, not only do you need to measure that the web server is up, but that the expected content shows up and renders correctly and so on.
This is like the model of the car I built. I knew what it was supposed to be like—I’d seen such a car in real life. Everything about the car needed to be in the model in order to make an effective scaled-down picture of the thing I was trying to reproduce. I knew what the real thing was like, and to make the best model I could, I had to construct something very similar.
Check the particular features
Another technique that’s needed for measurement is checking particular features that you know are needed—even if nobody ever experiences those features directly. This sort of modelling is what we do when we run synthetic tests. So, for instance, you might check whether all the DNS servers are working as expected, even though nothing on the internet ever uses all the nameservers at once. You might test every web server in a server pool, even though part of the point of pooling is exactly to make outages invisible to users. In this way, this kind of modelling is like the model of the TIE Fighter. The model was of something that existed only in a fictional universe.
To measure effectively in the cloud, you need both of these techniques. And as I’ve already said repeatedly in this sequence, measuring correctly requires a lot of knowledge about what your application is like. When you deploy in the cloud, you need to measure more, not less, because your control over the environment is gone. The techniques are complementary:
-
Use the “car model” style of measurement to ensure your users’ experiences are good. These kinds of models are good candidates for triggering urgent alerts—the sort that cause operations staff to get paged. When one of these models shows a fault, it is a good bet that someone on the internet is having a bad experience. These models are very bad for troubleshooting, however, because all they tell you is that there is a mismatch between what you expect and what actually happens.
-
Use the “TIE Fighter model” style of measurement to understand what is going on. Individually, these models are usually very bad candidates for triggering alerts to your operations staff. Because they usually model parts of your system that nobody will see directly, what they provide you with is a total picture of the parts of your system. This kind of modeling is, therefore, really good for allowing you to see changes over time and for producing trend lines that allow you to catch problems before your customers might see them.
But what of the airplane-style models? They are the ones to be worried about. When I built my model Spitfire, I had never seen such a plane. I had to trust the people who made the model kit to have built the model correctly. Unfortunately, too many operational models are like this; they provide a picture of something, but it's impossible for you to know whether it is accurate. And this issue is often worse when you use cloud measurement systems because you do not know exactly what they are measuring.
When you move into the cloud, you need to measure everything. Otherwise, you really are in a cloud: everything is grey, and you can’t see where you are going. But with good modeling, you can have a clear, high-contrast view of what is going on even though you don’t have control of the underlying systems. The result: a high-quality service your customers can rely on. Then you will really know that you are building better networks.