There are a lot of terms floating around to describe how to set up metrics for evaluating service performance over the network. No doubt the most established is Quality of Service, or QoS, which has generally taken on a fairly technical, bandwidth-centric definition where it remains still valuable as a metric, but is far from summing up what really counts in the eyes of the end user. There are other terms like RUM or “real user monitoring” that are technical, but at least focusing on a series of monitoring technologies truly targeted at the “real user” or “end user.” And then there’s QoE, or Quality of Experience, which is my personal favorite because it is not centered in technology, but in the flesh-and-blood experience of the user consuming your services.
This focus is a lot like Mean Opinion Score (MOS) was originally as it applied to telecommunications services. Like it or not, how your customers “feel” about your services is in the end going to be how they’re going to vote with their pocketbook or their budget approvals.
This isn’t to say that technical metrics don’t count. You absolutely need them. Building towards QoE with a good combination of technical metrics and a healthy dose of customer dialog is a fine art. In this column we’re going to look briefly at some of the metrics and technologies that most often apply. But before we do, I suppose I should answer the overwhelmingly obvious question from many network managers: “Why me? Why should I care about QoE? Isn’t that the job of the applications manager or the Help Desk?”
The answer is that you’re partly right. QoE isn’t your job alone. But the network is the delivery system for almost all application services, like it or not, including VoIP and unified communications, but most predominantly focusing on Web-based applications. Many of these depend heavily on network-centric monitoring tools to ensure their performance.
EMA data shows that Web-based applications for internal use, followed by client-server applications, then Web-based applications for external use and Web services, dominate what’s actually being deployed over the network, all well ahead of VoIP. And EMA research also reinforces the importance of the network in the delivery of application and business services - a healthy 72% of respondents from a wide range of enterprises and service providers had more than 20 remote branch offices. In a parallel EMA survey, 34.1% had more than 100 remote locations. Networked applications are enabling new business models across verticals – true today with Web 2.0, and even truer tomorrow with the advent of globally dispersed service-oriented architectures.
So, QoE is important even if it isn’t your job alone. Moreover, if you’re interested in optimizing the network, what better value to use than the value that truly counts from the customer/consumer perspective? And that’s QoE. QoE is also the natural place to trigger problem isolation and root-cause diagnostics because it’s about meaningful service parameters versus component-centric metrics that are primarily useful for diagnostics.
Some QoE Metrics
The first thing to keep in mind is that QoE metrics are not designed in themselves to be diagnostic metrics. For instance, while configuration information can be hugely valuable in isolating a root cause and remedying a problem, it doesn’t do much to inform on QoE. The same can be said about flow-based traffic volumes, or packet analysis and network forensics.
EMA research indicates that most often QoE is focused on metrics such as availability, Mean Time to Repair (MTTR), and Mean Time Between Failures (MTBF). But the first thing to say about QoE is that any number of studies have shown that end users care more about degraded response time than intermittent availability issues. This has more to do with human psychology than network engineering. End users typically believe that a complete failure in availability will soon be remedied, whereas they remain skeptical that effective action will be taken if their response time is degraded. Moreover, degraded response time tends to persist far longer than most availability issues, so their perception actually is reality in this case.
Yet response time can be troublesome in other ways. Average response time over a day or a week or a month may not be very meaningful in itself. Inconsistent response time metrics, even with faster overall averages, can be far more troublesome to rhythms in working and communicating than somewhat slower but more consistent service delivery. And those terrible spikes that alienate users can occur within a single minute or even within seconds – spikes that may not only help to catch alienated users but also help to provide insight on where the problems lie.
And there are other metrics that will come into play with various degrees of relevance. These include flexibility and choice of service – something that network planning plays a role in. Data security is another core value that people may not think of in QoE, but for certain applications, and certain information, it can be a prime customer concern. Cost effectiveness and visibility into usage and cost justification is increasingly of interest to business clients. Mobility is another QoE attribute, more important for some applications than others, and the list goes on.
Technologies
Probably the biggest debate regarding QoE in terms of response time is between the value/role of synthetic versus observed transactions. The truth is that both are valuable. Synthetic tests are proactive, can give you more consistent data suitable for SLA requirements, and can let you know if availability is lost, which observed transactions typically cannot.
Many synthetic tests also offer diagnostic value, especially when the scripts are optimized to look at certain types of transactional behaviors that occur at on an ad hoc basis in the real world. On the other hand, synthetic tests occur at specified intervals and therefore may fail to capture any number of real problems that occur in finite timeframes. Moreover, many observed capabilities have become increasingly rich in function and are beginning to offer much of the granularity of insights once available only in synthetic tests. So, the truth is that both synthetic and observed should be in place – if you really care about QoE.
Placement is also important. Data-centric transactional monitoring can provide back-office detail that is quite useful in diagnostics, but it can also provide rich insights into certain issues surrounding QoE - in some cases playing back actual transactions in cinematic manner. But capturing data at the end station is really at the heart of QoE, through either synthetic and/or observed transaction capabilities. Many of the more network-centric solutions for QoE benchmarking sit at the edge of the data center and calculate end-user experience, in some cases in conjunction with insights into the back-office transaction as well.
While most of these are not the most “heavy hitting” in the true QoE sense, having their insights can allow you to execute far more quickly on actually diagnosing the cause of the problem, as well as to anticipate performance degradations in remote locations.