Chapter 4: Passthrough

Cisco Press

When the public switched telephone network (PSTN) was initially constructed, voice communication was the primary goal. However, as data communications such as fax, modem, and text became more important, they also were made to work over the PSTN using special protocols and transport methods. Today, with VoIP taking the place of the PSTN, voice communication is still the primary objective, and specific protocols and procedures are again needed to transport fax, modem, and text communications.

One such feature that voice gateways can implement to transport modem, fax, or text telephony traffic is passthrough. This transport mechanism is the easiest and simplest way for a voice gateway to pass modulated data.

For the most part, passthrough works just like a normal voice call. The voice gateway receives an analog waveform from the modem, fax, or text device and encodes it using an appropriate coder/decoder (codec). These encoded samples are then encapsulated and transported over the packet network using the Real-Time Protocol (RTP).

You will also commonly hear passthrough referred to as voice-band data (VBD) by more recent literature and many of the specifications. These two terms are used interchangeably for the remainder of this book.

This chapter provides an in-depth look at how passthrough operates and the different ways that it is implemented to transport modem, fax, and text data. Specifically, this chapter discusses the following topics:

  • Passthrough Fundamentals

  • NSE-Based Passthrough

  • Protocol-Based Pass-Through for Fax

  • Text over G.711

  • A Future Look at ITU-T V.152

Passthrough Fundamentals

With only a few minor variations that are discussed at the end of this section, a passthrough call is treated the same as a VoIP call from a voice gateway perspective. The human voice sample that is processed by the gateway on a VoIP call is simply replaced with the modulated data used by faxes and modems.

For both voice and passthrough calls, a process known as pulse code modulation (PCM) converts an analog signal to an equivalent digital representation. This digital signal is what is packetized and transported over the IP network. Figure 4-1 illustrates how PCM works.

Figure 4-1

Pulse Code Modulation

PCM first filters out all frequencies greater than 4000 Hz because the majority of human speech occurs in the 300 Hz to 3200 Hz range. Nyquist's theorem specifies that to accurately reconstruct a signal, it must be sampled at twice the highest frequency of that signal. Because a band-limited 4000 Hz filter is used, the original analog signal must therefore be sampled at 8000 times a second.

Sampling is merely taking an amplitude reading of the original signal. This process is known as pulse amplitude modulation (PAM). PCM takes it one step further than PAM and quantizes the signal.

Quantization is the process of breaking up the continuous amplitude spectrum into discrete intervals. Each quantization level is assigned an 8-bit codeword. Therefore, there are 256 distinct amplitude levels with a unique 8-bit codeword assigned to each one. Figure 4-1 illustrates an analog signal encoded as digital PCM through the process detailed in the preceding paragraphs.

For a VoIP call, there are a number of codecs to choose from. A codec integrates with PCM and defines a particular encoding scheme to be used in the conversion of an analog signal into its digitally encoded version. Codecs vary in bandwidth requirements, voice quality, and computational requirements.

For example, voice is commonly transported over the WAN using high compression codecs, such as G.729 (8 Kbps) or G.723 (5.3 Kbps/6.3 Kbps). Because these codecs are optimized for human speech, they do a great job in preserving speech quality while at the same time offering a high compression rate that saves bandwidth.

However, the tones used for modem and fax negotiation are very different in nature from human speech and in many instances not even in the same frequency range. This makes it difficult to optimize a high-compression codec for both voice and fax/modem tones. These high-compression, speech-optimized codecs distort modulated data signals to the point where modems and fax machines are unable to communicate successfully.

Although codecs such as clear-channel codec or 32 Kbps compressed G.726 may transport modem or fax tones in-band, this discussion will be limited to using G.711 as the VBD codec. This is because it is overwhelmingly the most frequently used and the only one officially supported for Cisco passthrough features. G.711 is a 64 Kbps uncompressed voice codec that implements a PCM scheme that is compatible with modulated data.

Rather than the uniform quantization seen in Figure 4-1, the G.711 codec uses a nonuniform quantization scheme, known as companding. This has the effect of a greater concentration of quantization levels at the lower amplitudes, and conversely the higher-amplitude values have quantization levels assigned more sparsely. Figure 4-2 shows this uneven distribution of quantization levels for the amplitude.

Figure 4-2

G.711 Companding of a PCM Signal

Companding is appropriate for voice because the majority of human speech occurs at the lower end of the amplitude spectrum. This allows for greater fidelity and improved voice quality for the lower-amplitude signal, which is the bulk of human speech.

Two types of companding are used in G.711: µ-law and a-law. They are similar in many ways, but µ-law has a bit less distortion for lower-amplitude signals, whereas a-law has a greater dynamic range than µ-law. The biggest difference is that µ-law is used by North America and Japan, whereas a-law is used by the rest of the world. It is important to note that these two companding schemes are not compatible, and any calls between countries that use different companding types have to convert between the two.

The major impairment that results from analog-to-digital conversions, such as PCM, is the introduction of noise. Any difference between the actual amplitude value of the original signal and its assigned value of the closest discrete quantization level will introduce quantization noise.

As Figure 4-2 highlights, the nonlinear distribution of quantization levels used in companding will produce less quantization noise at the lower-amplitude signals and more quantization noise at the higher-amplitude signals. This keeps the signal-to-noise ratio (SNR) relatively constant over the entire signal amplitude range.

Now that the process of digitally encoding an analog signal has been discussed, it is important to understand how these PCM samples of modem, fax, and text data are packetized for transport over the IP network. Like in any data communication, the payload is independently encapsulated by the corresponding protocol of each of the OSI layers. For example, Figure 4-3 is an illustration of how PCM modulated data samples would be encapsulated for transmission over an IP configured Ethernet interface.

Figure 4-3

Encapsulation of an RTP Packet over Ethernet

Because of the real-time nature of the transport of the PCM-encoded modulated data, it is important to take a closer look at the RTP header. From Figure 4-3, you can see that the G.711 encoded samples of voice-band modulated data become the payload of an RTP encapsulated packet. Figure 4-4 illustrates the RTP header, which is defined in RFC 3550.

All real-time traffic that is encapsulated in RTP maintains the timing characteristics of the original analog signal via the Timestamp field in the RTP header. Likewise, the PCM encoded samples can be played out in the same order as they were received because of the Sequence Number field. For this discussion, the most important field is the Payload Type.

Figure 4-4

RTP Packet Header

The Payload Type field identifies the type of data being carried in the RTP packet. This defines how the packet will be interpreted and dealt with by the remote side. Table 4-1 shows the Payload Type values that are defined in RFC 3551.

Table 4-1 Payload Type Values

Payload Type

Payload Encoding

Payload Type

Payload Encoding


PCM µ-law
































PCM a-law


























G.726 (40 kbps)




G.726 (32 kbps)




G.726 (24 kbps)




G.726 (16 kbps)





























Table 4-1 shows a number of dynamic and unassigned payload types. The dynamically assigned portion of this range is what is primarily discussed in this chapter. Unless explicitly configured on the gateway, Cisco uses the dynamic and unassigned payload type values shown in Table 4-2 by default.

Table 4-2 Dynamic and Unassigned Payload Types Commonly Used by Cisco

Default Dynamic and Unassigned Payload Type

Payload Encoding


RFC 2198 Passthrough Redundancy


Cisco Fax Relay Switchover


Cisco Fax Relay Switchover ACK


Named Signaling Event


Named Telephony Event


Cisco Text Relay


Cisco RTP DTMF Relay


Cisco Fax Relay


Cisco CAS Payload


Cisco Clear-Channel

When using passthrough, a voice gateway identifies the contents it is transmitting as simply PCM (PT=0 for G.711 µ-law or PT=8 for G.711 a-law). Thus, it makes no distinction within the RTP packet between a voice call and a modem/fax/text call.

As Figure 4-5 highlights, the fax/modem modulated data is transparently carried over the IP network, and the data is never demodulated within the IP infrastructure. This is the principal difference between passthrough and relay, which is covered in Chapter 5, "Relay."

Figure 4-5

Fax and Modem Passthrough

When the passthrough feature is initiated on a Cisco voice gateway, additional events take place to ensure that the modulated data is successfully transported across IP. The most important event is known as codec upspeed.

Codec upspeed makes sure that the passthrough call uses a low-compression codec such as G.711 µ-law or G.711 a-law. Passthrough calls start out in the beginning as regular voice calls. This means that the call could be using a high-compression codec such as G.729. However, when the passthrough feature is initiated, this codec is changed to G.711 in what is termed codec upspeed.

In addition to codec upspeed, another change also takes place in the Cisco voice gateway when it switches into VBD mode and prepares for a passthrough call. To make the IP path as transparent as possible, the DSP disables Voice Activity Detection (VAD). VAD is a bandwidth-saving feature that sends packets only when there is voice detected during the call. If VAD were to remain enabled for the passthrough calls, signals could be clipped, negatively affecting the data being transported.

Slight changes are also made to the DSP's jitter or playout buffer. While in voice mode, the playout buffer is adaptive and constantly adjusts to changing network conditions. However, during passthrough mode, the playout buffer becomes fixed to an optimum value for the call. For a more comprehensive discussion of what a jitter buffer is and the specifics of how it behaves during a passthrough call, see the "IP Troubleshooting" section of Chapter 12, "Troubleshooting Passthrough and Relay."

After the detection of certain tones by the DSP, the switchover to passthrough is signaled in one of two ways. One is NSE-based passthrough signaling, which involves the exchange of Named Signaling Events (NSE) packets between the gateways. The other is protocol-based passthrough signaling, in which a direct negotiation occurs in the protocol stack of the call signaling protocol.

NSE-Based Passthrough

When passthrough is configured on a voice gateway, it takes the modulated data from a fax, modem, or text device and transparently transports it in the media stream as PCM samples encapsulated in RTP.

The terminating gateway (TGW) always switches to NSE-based passthrough mode first by detecting the appropriate tone from the answering modem or fax machine. This tone is the 2100 Hz CED from a standard fax machine or the 2100 Hz ANSam tone from a modem or SG3 fax machine.

When the TGW detects this tone, it undergoes a passthrough switchover, including a codec upspeed to the VBD codec (G.711). In conjunction with this switchover to NSE-based passthrough, the TGW also transmits an in-band signal in the media stream to the originating gateway (OGW). In this message, the TGW signals to the OGW to switch into passthrough mode. This signal is communicated using NSE packets.

NSEs are a Cisco proprietary message that are sent as part of the RTP stream and are identified using a payload type of 100 in the RTP header by default. Despite being a proprietary message, the NSE packet format is the same as for standards-based Named Telephony Events (NTE), described in RFC 2833. Figure 4-6 shows the NSE/NTE packet format.

Figure 4-6

NSE Packet Format

Note - The NSE payload type is configurable on a Cisco IOS voice gateway to be any value between 98 and 119. The default value is 100.

The Event ID field uses Cisco-defined events to signal in-band the coordination of a variety of tasks. Table 4-3 shows the NSE event numbers used for passthrough. Notice that NSE-192 is used by the TGW to signal to the OGW to go into VBD mode.

Note - The Volume and Duration fields in the NSE packet will always be set to 0s for the discussions in this chapter. Only the event ID is pertinent.

1 2 3 Page 1
Page 1 of 3
SD-WAN buyers guide: Key questions to ask vendors (and yourself)