Management Vendors Fail to Measure “Real” User Experience

We have a problem Houston. IT managers want to measure their users' real experience, but the vendor-supplied tools at their disposal do not spit out the measurements they need. Here's why we make this heretical statement. In a survey of 364 IT managers about their APM practices we did for a recent Network World article, we asked them to rate the importance of measuring 12 application performance attributes--and to tell us if they actually measure those attributes. For attributes that reflect the end-user experience, we found huge gaps between what is important and what is measured.

As the following figure shows, some 50 percent or more of respondents rate end-user page response time, server query response time and TCP transaction response time (all key indicators of the real end-user experience) as important, yet do not measure them. Compare this to gaps of under 30 percent for important performance attributes that can be counted, such as network availability, server availability and bandwidth utilization--and the source of the unmet need becomes clear. Half of the enterprises that know they should measure response times don't do it.

We posit that the primary reason most IT managers do not measure response times is not because they don't want to, but rather they cannot because their vendors do not capture these metrics. This conclusion is bolstered by the fact that 42 percent of respondents cite lack of proper tools as an impediment to implementing APM best practices, second only to lack of sufficient manpower.

Why are vendors not supplying this vital information? We believe it is because it is much easier for performance management tool vendors to count things than to time things.

Counting is easy because devices that process the traffic (e.g., a server or switch), or that watch the traffic (e.g., a probe) can easily implement a simple counter. The counter is stateless with respect to what the user is doing and can count around the clock. The management tool can sample (i.e., take a reading from) the counter whenever necessary, and there is a fixed number of measurements each day (e.g., 24 if a measurement is taken every hour). It is certainly possible to count at a finer granularity (e.g., by application or by user), but it is still counting without knowing what the user is doing, thus performance cannot be mapped to the real user experience.

Unlike counting, timing is hard because measuring response times takes synchronization. You need the virtual equivalent of a stop watch to determine start and stop times, and you need automated report-generating capabilities. The measurement tool needs to start and stop timers based on user-specific and/or application-specific events. The events occur and must be detected outside the tool's control. In addition, the timer must maintain state with respect to the event it is timing, and the number of measurement samples varies over the course of a day, and from day to day--making it imperative that the measurements be quickly aggregated. This is challenging, and so far few vendors have met the challenge.

We expect many vendors will claim our conclusions are mistaken--but clear-eyed examination of their marketing materials will llikely reveal that whereas they may claim to provide metrics that reflect the real user experience, in fact they provide only counts of things like packets sent to a user. Metrics such as these simply indicate that the service is up, but do not tell you a whit about the user's actual experience. For that you need timers. 

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2011 IDG Communications, Inc.

SD-WAN buyers guide: Key questions to ask vendors (and yourself)