• United States

Mind your Ps and Qs

Feb 20, 20064 mins
Network SecurityNetworkingSecurity

Voice-quality testing is a traditionally obscure and dark corner of telephony that has recently become more interesting with the rise in VoIP and mobile communications. The standards for VoIP testing come out of the International Telecommunications Union (ITU), formerly known as the International Consultative Committee on Telegraphy and Telephony.

The original ITU standard, P.800, (more formally known as Recommendation P.800) for voice-quality testing is decidedly non-technological. The test, essentially, requires a panel of judges to listen to voice calls and then score them based on a particular set of criteria. The scores are aggregated into a single number, called the Mean Opinion Scores (MOS), typically shown in a range of 1 (worst) to 5 (best).

Running MOS tests is expensive and time consuming. To test the four scenarios across our SSL VPN field of 10 contenders would have been, to put it politely, “not in the budget.”


The Perceptual Evaluation of Speech Quality Listening Quality scale ranges from 1.0 to 4.5, with 4.5 meaning that no distortion was measured.
ScoreQuality of the Speech


The Perceptual Analysis Measurement System Listening Effort scale shows the effort required to understand the meaning of sentences.
1Complete relaxation possible; no effort required
2Attention necessary; no appreciable effort required
3Moderate effort required
4Considerable effort required
5No meaning understood with any feasible effort

Fortunately, the ITU understands the need for a more efficient and repeatable way of testing voice quality and has created alternative tests that can be automated.

Perceptual Evaluation of Speech Quality (PESQ) represents a complex, but objective, test that is supposed to be a close analog to MOS. Through a well-defined series of phases, including level and time alignment, input filtering, perceptual modeling, equalization and disturbance processing, a score, called the PESQ-LQO (Listening Quality, Objective) pops out that maps directly to the MOS score.

The ITU tried several times to get the PESQ score to match what a MOS test turns up. In our testing, we report the PESQ-LQO score from ITU Recommendation P.862.1 because it maps most closely to the MOS that is still frequently used. There are several other PESQ scores seen, defined in Recommendation P.862.

All three scores measure the same thing, although their scale and linearity vary. This means that comparing results across different tests is not possible if different versions of the test were used.

Understanding exactly what a MOS represents is another matter. Although the obvious “higher is better” applies, trying to figure out when a voice call goes from acceptable to unacceptable is another matter. A normal analog or digital telephone call will generally have a MOS of 4.2 to 4.4. A typical cell phone call will range from 3.0 to 3.7, while a poor cell phone call would be scored less than 2.

Another voice-quality scoring system described by the ITU is the Perceptual Analysis Measurement System (PAMS), in P.800. PAMS is a “listening opinion” test, which is different from PESQ, a “conversation opinion” test. PAMS attempts to measure both listening quality (using the same scale as PESQ) and listening effort. PAMS specifically looks for errors introduced into the voice channel and predicts how they will affect listening quality and listening effort.

A third metric is the ITU’s Perceptual Speech Quality Measure (PSQM) from P.861. PSQM is recommended for use in assessing speech codecs, and not the behavior of an entire voice connection. The PSQM is often reported, too, and we have scaled it to match MOS. The native scores for PSQM range from 0 (excellent) to 6.5 (poor), so we have rescaled PSQM to match the range of PESQ.

In our testing, we generated all of these scores, and they can be found in a spreadsheet. All of the analysis in the accompanying story is based on the PESQ-LQO scores.

Return to VoIP Clear Choice Test