Audio Illustrations of Voice Quality Issues

The Actual audio recordings may be downloaded from the Cambridge University Press Catalog at Cambridge University Press

These examples are presented in the wav format; they may be listened to on a personal computer with Microsoft Windows Media Player, RealPlayer, and other audio programs, which play the wav format. Specialized programs such as Cooledit are best suited for the demonstrations. They offer an audiovisual experience and the ability to analyze and manipulate the signal.

The audio illustrations are divided into the following groups –

1.       Codec illustrations, where the same speech segment is played through a variety of particular codecs

2.       Hybrid echo where the same speech segment produces echo as a function of delay and ERL combinations

3.       Various recordings of noise types, including different crowd noises, wind, seashore, busy street, and white noise

4.       Acoustic echo control under noisy conditions with different comfort noise matching

5.       Recordings which demonstrate the “before” and “After” VQS processing of a conversation, which takes place under noisy conditions

6.       Examples related to cases discussed in chapter 12 of the book

7.       DTMF and ring-back tones

The intent of the audio illustrations is to highlight certain elements that can be better depicted by having them listened to. They can also serve as raw material for those who wish to use the demos in their own presentations and test platforms.

 Codec Illustrations

The codec illustrations are all produced without background noise and/or transmission errors. Consequently, the listener may not always find significant differences in performance between some of the better codecs as compared to the ones with higher compression ratios, or those incorporating noise reduction as an integral part of the codec. Still, in spite of this less than complete account, a sensitive ear can detect differences in voice quality.

Although we recognize the insufficiency in this section, we realize that due to the infinite number of adverse condition combinations, it is still worthwhile presenting the codecs performance under a common denominator excluding noise and errors. Even though some of the performance differences may not be as sharp, they still serve as a fair illustration of the particular circumstance.

The following is a list of the files containing the recordings in this group

8.       CodecTest.wav – Uuencoded Speech segment

9.       CodecTest_Alaw.wav – A Law encoded/decoded speech segment

10.   CodecTest_ulaw.wav – m Law encoded/decoded speech segment

11.   CodecTest_AMR475_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 4.75 Kbps rate

12.   CodecTest_AMR475_VAD2.wav – Speech segment encoded/decoded with AMR using VAD2 using 4.75 Kbps rate

13.   CodecTest_AMR515_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 5.15 Kbps rate

14.   CodecTest_AMR515_VAD1.wav – Speech segment encoded/decoded with AMR using VAD2 using 5.15 Kbps rate

15.   CodecTest_AMR59_VAD2.wav – Speech segment encoded/decoded with AMR using VAD1 using 5.9 Kbps rate.

16.   CodecTest_AMR59_VAD2.wav – Speech segment encoded/decoded with AMR using VAD2 using 5.9 Kbps rate.

17.   CodecTest_AMR67_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 6.7 Kbps rate

18.   CodecTest_AMR67_VAD2.wav – Speech segment encoded/decoded with AMR using VAD2 using 6.7 Kbps rate

19.   CodecTest_AMR74_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 7.4 Kbps rate.

20.   CodecTest_AMR74_VAD2.wav – Speech segment encoded/decoded with AMR using VAD2 using 7.4 Kbps rate.

21.   CodecTest_AMR795_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 7.95 Kbps rate.

22.   CodecTest_AMR795_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 7.95 Kbps rate.

23.   CodecTest_AMR102_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 10.2 Kbps rate.

24.   CodecTest_AMR102_VAD2.wav – Speech segment encoded/decoded with AMR using VAD2 using 10.2 Kbps rate

25.   CodecTest_AMR122_VAD1.wav – Speech segment encoded/decoded with AMR using VAD1 using 12.2 Kbps rate

26.   CodecTest_AMR122_VAD2.wav – Speech segment encoded/decoded with AMR using VAD2 using 12.2 Kbps rate

27.   CodecTest_EFR.wav - Speech segment encoded/decoded with EFR

28.   CodecTest_EVRC.wav - Speech segment encoded/decoded with EVRC

29.   CodecTest_FR.wav - Speech segment encoded/decoded with FR

30.   CodecTest_HR.wav - Speech segment encoded/decoded with HR

31.   CodecTest_SMV.wav - Speech segment encoded/decoded with SMV

Echo as a function of delay and ERL


This section of audio illustrations contains combinations of delay (in milliseconds) and ERL. The longer the delay, the more noticeable the echo. The lower the ERL, the louder the echo.

The following is a list of files containing the recordings in this group:

1.       no_echo.wav – Original speech with no echo

2.       echo_d50_ERL3.wav – Speech with Echo at 50 msec delay and ERL of 3 dB

3.       echo_d50_ERL6.wav – Speech with Echo at 50 msec delay and ERL of 6 dB

4.       echo_d50_ERL12.wav – Speech with Echo at 50 msec delay and ERL of 12 dB

5.       echo_d50_ERL20.wav – Speech with Echo at 50 msec delay and ERL of 20 dB

6.       echo_d100_ERL3.wav – Speech with Echo at 100 msec delay and ERL of 3 dB

7.       echo_d100_ERL6.wav – Speech with Echo at 100 msec delay and ERL of 6 dB

8.       echo_d100_ERL12.wav – Speech with Echo at 100 msec delay and ERL of 12 dB

9.       echo_d100_ERL20.wav – Speech with Echo at 100 msec delay and ERL of 20 dB

10.   echo_d200_ERL3.wav – Speech with Echo at 200 msec delay and ERL of 3 dB

11.   echo_d200_ERL6.wav – Speech with Echo at 200 msec delay and ERL of 6 dB

12.   echo_d200_ERL12.wav – Speech with Echo at 200 msec delay and ERL of 12 dB

13.   echo_d200_ERL20.wav – Speech with Echo at 200 msec delay and ERL of 20 dB

14.   echo_d500_ERL3.wav – Speech with Echo at 500 msec delay and ERL of 3 dB

15.   echo_d500_ERL6.wav – Speech with Echo at 500 msec delay and ERL of 6 dB

16.   echo_d500_ERL12.wav – Speech with Echo at 500 msec delay and ERL of 12 dB

17.   echo_d500_ERL20.wav – Speech with Echo at 500 msec delay and ERL of 20 dB

Noise Types

The different noise types may be useful for those who may wish to incorporate these into their own sample recordings for demonstration purposes. The included types are recorded in the following list of the files:

1.       W_Noise.wav - White Noise

2.       airport.wav - Airport noise

3.       café.wav - Café Noise

4.       crowd-noise.wav - Crowd Noise

5.       seashore.wav - Seashore Noise

6.       train.wav - Train Noise

7.       windy.wav - Wind Noise

8.       noisy street.wav - Noisy Street

Acoustic Echo Control and Comfort Noise Matching


In order to demonstrate how important proper comfort noise matching is, we produced five recordings. The first one exemplifies an untreated acoustic echo; the others depict suppression techniques accompanied by four different types of noise fill ranging from silence, through white, colored, to a spectral match.

The following is a list of the files containing the recordings in this group.

1.       echo and noise.wav – Untreated acoustic echo with background noise

2.       AEC and no NM.wav – Acoustic Echo Control with No Comfort Noise fill

3.       AEC and W NM.wav – Acoustic echo control with White Noise fill

4.       AEC and colored nm.wav – Acoustic Echo Control with colored comfort noise fill

5.       AEC and spectral nm.wav – Acoustic Echo Control with Spectral comfort noise fill

Before and After VQS


This group of recordings demonstrates a broader scope comprising two voice quality applications – Noise Reduction and Noise Compensation.  The female side transmits speech and noise, but after it is processed by the VQS the noise is reduced, and the male is listening to a noise reduced speech. On the other side, without VQS the female would have listened to the male’s speech, as it would have been masked by the noise surrounding her. After the VQS NC processing, the male’s speech is amplified without affecting the noise level to make it stand out more clearly. 

 The following is a list of the files containing the recordings in this scenario:

1.       Female before NR.wav
2.       Female after NR.wav

3.       Male before NC.wav
4.       Male after NC.wav

Trouble Shooting


Chapter 12 in the book comprises examples of common troubles associated with voice quality and mal-functioning voice quality systems. This section contains examples supporting cases discussed in the chapter.

The following is a list of the files containing audio demonstrations:

1.       example_12.1.wav – This file contains a simple demonstration of consistent hybrid echo. It may be used as a reference for the next examples.

2.       example_12.3a.wav – This file illustrates echo at the beginning of the call before convergence takes place

3.       example_12.3b.wav – This file illustrates a louder echo (in comparison to 12.3a) at the beginning of the call due to NC.

4.       example_12.4.wav – This file illustrates how NC may cause saturation of speech when not controlled properly

5.       example_12.5.1.wav – This file illustrates hybrid echo. It ought to be contrasted with -

6.       example_12.5.2.wav – This file illustrates acoustic echo. It ought to be contrasted with the previous example to sense the difference between the two echo types.[6]

7.       example_12.6.wav – When echo is returned with no loss (0 dB ERL), it is as loud as the speech source, and echo cancellers find it extremely difficult to cancel. The file illustrates echo at 0 dB ERL.

8.       example_12.7.1.wav – Ought to be used as a reference for the next file

9.       example_12.7.2.wav – This file illustrates how an improperly tuned ALC may amplify noise, and may even cause periodic speech saturations

10.   example_12.11.wav – This file illustrates how clipped peaks sound even though level has been restored to a comfortable listening.

11.   example_12.14.1.wav – This file ought to be used as a reference for the next two files. It is the signal entering the Rin port of the VQS

12.   example_12.14.2.wav – This file contains the signal in example_12.14.1 after it passed through a noise reduction application

13.   example_12.14.3.wav – This file contains the signal in example_12.14.1 after it passed through an overly aggressive noise reduction application

Ringback and DTMF Signals Example Test Tools


This directory includes recordings, which may be used to verify the correct operation of VQ systems during Ringback and DTMF signaling. For a more comprehensive test, a robust suite of tests ought to include varied power levels, frequency deviations, and different durations to test the complete specifications of the standards.

Ringback


In the United States, a large majority of carriers use a de-facto ring back standard during call initialization. This ring back tone is the sum of a 440 Hz and 480 Hz signal for 2 seconds duration.  Each tone pulse has a 4 second silence period between pulses.  Research has shown that in most cases, a common ring-back exists at levels from –20 dBm0 to –15 dBm0.

Test & Verification


Three files are used to test ring-back through VQS systems.

network_ring.wav – series of ring-back pulses that were recorded from an actual U.S. commercial carrier network.

ring.wav – series of ring-back pulses that were generated using commercial signal processing software.

ring_low.wav – series of ring-back pulses that were generated using commercial signal processing software at a lower power level.

Each file can be played independently through the Sin or Rin ports of a VQS. The test is conducted by subjectively evaluating the output recording. There should be no indication of distortion, changes in level, or changes in frequency content. This is verified both audibly and visually using commercial software.

Test & Verification of DTMF


Four files are used to test DTMF through VQS systems. Each file consists of the following string of digits “1234567890*#”.

dtmf_50_50.wav – series of DTMF digits with a 50 ms duration and 50 ms silence between digits.

dtmf_50_50_low.wav – series of DTMF digits with a 50 ms duration and 50 ms silence between digits, sent at a lower power level.

dtmf_500_500.wav – series of DTMF digits with a 500 ms duration and 500 ms silence between digits.

dtmf_500_500_low.wav – series of DTMF digits with a 500 ms duration and 500 ms silence between digits, sent at a lower power level.