Skip Links

Network World

  • Social Web 
  • Email 
  • Close
Cisco subnet: An independent Cisco community

Chapter 7: Improving and Maintaining Voice Quality

Cisco Press
By Kevin Wallace , Network World , 03/17/2008
  • Share/Email
  • Comment
  • Print

 

More chapters from new and classic Cisco Press books 

Rate your favorite Cisco Press books

 

After reading this chapter, you should be able to perform the following tasks:

  • Identify problems presented by IP networks that affect voice and describe the QoS mechanisms used to address those problems.

  • Set up basic QoS configurations, on both Cisco IOS-based Catalyst switches and routers, using Cisco's AutoQoS feature.

  • Implement call control on the network using CAC tools and mechanisms.

When human speech is converted to analog electrical signals and then digitized and compressed, some of the qualitative components are lost. This chapter explores the components of voice quality that you must maintain, the methods that you can use to measure voice quality, and quality of service (QoS) tools that you can implement in a network to improve voice quality.

 

Optimizing Voice Quality

Because of the inherent characteristics of a converged voice and data IP network, administrators face certain challenges in delivering voice traffic correctly. This section describes these challenges and offers solutions for avoiding and overcoming them when designing a VoIP network for optimal voice quality.

 

Factors that Affect Voice Quality

Because of the nature of IP networking, voice packets sent via IP are subject to certain transmission problems. Conditions present in the network might introduce problems such as echo, jitter, or delay. These problems must be addressed with QoS mechanisms.

The clarity, or cleanliness and crispness, of the audio signal is of utmost importance. The listener must be able to recognize the speaker's identity and sense the mood of the speaker. These factors can affect clarity:

  • Fidelity—The degree to which a system, or a portion of a system, accurately reproduces, at its output, the essential characteristics of the signal impressed upon its input or the result of a prescribed operation on the signal impressed upon its input (definition from the Alliance for Telecommunications Industry Solutions [ATIS]). The bandwidth of the transmission medium almost always limits the total bandwidth of the spoken voice. Human speech typically requires a bandwidth from 100 to 10,000 Hz, although 90 percent of speech intelligence is contained between 100 and 3000 Hz.

  • Echo—A result of electrical impedance mismatches in the transmission path. Echo is always present, even in traditional telephony networks, but at a level that cannot be detected by the human ear. The two components that affect echo are amplitude (that is, loudness of the echo) and delay (that is, the time between the spoken voice and the echoed sound). You can control echo using echo suppressors or echo cancellers.

  • Jitter—Variation in the arrival of coded speech packets at the far end of a VoIP network. The varying arrival time of the packets can cause gaps in the re-creation and playback of the voice signal. These gaps are undesirable and annoy the listener. Delay is induced in the network by variation in the routes of individual packets, contention, or congestion. You can often resolve variable delay by using dejitter buffers.

  • Packet drops—The discarding of voice packets. Typically, when a VoIP packet is dropped from a network, 20 ms of audio is lost.

  • Delay—The time between the spoken voice and the arrival of the electronically delivered voice at the far end. Delay results from multiple factors, including distance (that is, propagation delay), coding, compression, serialization, and buffering.

  • Sidetone—The purposeful design of the telephone that allows the speaker to hear the spoken audio in the earpiece. Without sidetone, the speaker is left with the impression that the telephone instrument is not working.

  • Background noise—The low-volume audio that is heard from the far-end connection. Certain bandwidth-saving technologies can eliminate background noise altogether, such as voice activity detection (VAD). When this technology is implemented, the speaker audio path is open to the listener, while the listener audio path is closed to the speaker. The effect of VAD is often that speakers think that the connection is broken, because they hear nothing from the other end.

Although each of the preceding factors affects audio clarity, factors that present the greatest challenges to VoIP networks include jitter, delay, and packet drops. A lack of network bandwidth is usually the underlying cause for these issues, which are addressed in the following sections.

 

Jitter

Jitter is defined as a variation in the delay of received packets, as illustrated in Figure 7-1. On the sending side, packets are sent in a continuous stream with the packets spaced evenly. Because of network congestion, improper queuing, or configuration errors, this steady stream can become uneven, because the delay between each packet varies instead of remaining constant.

When a router receives a VoIP audio stream, it must compensate for the jitter that is encountered. The mechanism that handles this function is the playout delay buffer, or dejitter buffer. The playout delay buffer must buffer these packets and then play them out in a steady stream to the digital signal processors (DSPs) to be converted back to an analog audio stream. The playout delay buffer, however, affects the overall absolute delay.

When a conversation is subjected to jitter, the results can be clearly heard. If the talker says, "Watson, come here. I want you," the listener might hear "Wat....s...on.......come here, I......wa......nt........y......ou." The variable arrival of the packets at the receiving end causes the speech to be delayed and garbled.

Figure 7.1

Figure 7-1
Jitter in IP Networks

 

Delay

Overall or absolute delay can affect VoIP. You might have experienced delay in a telephone conversation with someone on a different continent. The delays can cause entire words in the conversation to be cut off, and can therefore be very frustrating.

When you design a network that transports voice over packet, frame, or cell infrastructures, it is important to understand and account for the predictable delay components in the network. You must also correctly account for all potential delays to ensure that overall network performance is acceptable. Overall voice quality is a function of many factors, including the compression algorithm, errors and frame loss, echo cancellation, and delay.

Figure 7-2 shows various sources and types of delay. Notice that there are two distinct types of delay:

  • Fixed delay components are predictable and add directly to overall delay on the connection. Fixed delay components include the following:

  • Coding—The time it takes to translate the audio signal into a digital signal

    Packetization—The time it takes to put digital voice information into packets and remove the information from packets

    Serialization—The insertion of bits onto a link

    Propagation—The time it takes a packet to traverse a link

  • Variable delays arise from queuing delays in the egress trunk buffers that are located on the serial port connected to the WAN. These buffers create variable delays (that is, jitter) across the network.

Figure 7.2

Figure 7-2
Sources of Delay

 

Acceptable Delay

The ITU specifies network delay for voice applications in Recommendation G.114. This recommendation defines three bands of one-way delay, as shown in Table 7-1.

Table 7-1  Components and Services

Range in Milliseconds

Description

0 to 150

Acceptable for most user applications.

150 to 400

Acceptable, provided that administrators are aware of the transmission time and its impact on the transmission quality of user applications.

Above 400

Unacceptable for general network planning purposes; however it is recognized that in some exceptional cases, this limit will be exceeded.


Note - This recommendation is for connections where echo is adequately controlled, implying that echo cancellers are used. Echo cancellers are required when one-way delay exceeds 25 ms (G.131).


This G.114 recommendation is oriented toward national telecommunications administrations and therefore is more stringent than recommendations that would normally be applied in private voice networks. When the location and business needs of end users are well known to a network designer, more delay might prove acceptable. For private networks, a 200 ms delay is a reasonable goal and a 250 ms delay is a limit. This goal is what Cisco proposes as reasonable, as long as excessive jitter does not impact voice quality. However, all networks must be engineered so that the maximum expected voice connection delay is known and minimized.

The G.114 recommendation is for one-way delay only and does not account for round-trip delay. Network design engineers must consider both variable and fixed delays in their design. Variable delays include queuing and network delays, while fixed delays include coding, packetization, serialization, and dejitter buffer delays. Table 7-2 provides an example of a delay budget calculation.

Table 7-2 Sample Delay Budget

Delay Type

Fixed (ms)

Variable (ms)

Coder delay

18

N/A

Packetization delay

30

N/A

Queuing and buffering

N/A

8

Serialization (64 kbps)

5

N/A

Network delay (through public network)

40

25

Dejitter buffer

45

N/A

Totals

138

33

 

  • Share/Email
  • Comment
  • Print
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed