Elsevier

Signal Processing

Volume 84, Issue 3, March 2004, Pages 663-667
Signal Processing

Fast Communication
Receiver-based packet loss concealment for pulse code modulation (PCM G.711) coder

https://doi.org/10.1016/j.sigpro.2003.10.021Get rights and content

Abstract

This paper introduces a high-performance concealment algorithm for packetized PCM-coded speech as in ITU-T Recommendation G.711. The proposed prediction algorithm implements a combination of linear prediction model and reverse-order replicated pitch period technique as implemented in the ITU-T G.711 Appendix A (ITUT Recommendation G.117, November 2000). The new algorithm is compared to the ITU-T G.711 Appendix A standard and to the commercial tool of packet repetition. It is shown to produce better concealment quality in almost all cases.

Introduction

Voice-over-IP (VoIP), the transmission of packetized voice over IP networks, is gaining much attention as a possible alternative to conventional public switched telephone networks (PSTN). However, impairments present on IP networks, namely jitter, delay and channel errors can lead to the loss of packets at the receiving end. This packet loss degrades the speech quality. Model-based coders, especially G.729-A [2] and G.723.1 [3] International Telecommunication Union (ITU-T) standards, have been extensively used for speech coding over IP networks because of their low bit rates requirements (5.3 to 6.4kbit/s for G.723.1 and 8kbit/s for G.729A) and their inherent ability to recover from erasure. Their built-in packet loss concealment makes their quality drop slowly with increasing amount of packet loss. However, their memory requires a few frames for the transition from a concealed state to a correct state. Thus, they actually tend to corrupt a few good packets before recovery as a result of a phenomenon known as “State Error” [6]. On the other hand, pulse code modulation (PCM, 64kbit/s) [9], although having a higher quality compared to G.729 and G.723.1 in the periods of normal operation, does not have the ability to conceal erasure. This results in a dramatic drop in the quality of speech during loss periods. Yet, PCM-based coders can recover from packet loss more rapidly than model-based coders, since the first speech sample in the first good packet restores speech to its original quality. The low complexity of PCM and its good performance in tandem coding make it a viable alternative to G.729 or G.723.1 for VoIP.

Several approaches have been implemented to address the frame erasure problem in PCM streams. The simplest approach is to play a mute (silence) packet in the erasure period. This method, however, introduces annoying voice clipping and most subjective tests proved that this method deteriorates the speech quality even at very low packet loss rates [4], [5]. Many other concealment algorithms depend on the quasi-stationary property of speech (not a lot of new information is delivered in the duration of a 10–30ms lost packet). One of the popular commercial concealment algorithms repeats the speech signal received in the last speech packet. This method performs better than silence substitution but its quality is still not satisfactory for high-quality applications.

ITU-T has lately standardized (in G.711 Appendix A [1]) a high-quality low-complexity PCM-coded speech concealment method. This method depends on waveform substitution. The packet loss concealment (PLC) algorithm first performs pitch detection on a sufficient length of speech samples kept in the history buffer (390 samples of 8kHz-sampled speech). The concealment unit then places the pointer one pitch period backward and copies a speech signal of the duration of the lost packet. This pitch predicted replica is played in the gap resulting from the missing speech segment. The algorithm also performs an overlap and add at the transition between the last received good samples and the concealed ones. This overlap and add is to ensure a smooth and natural transition and higher quality for the resulting concealment. However, this results in an added algorithmic delay of 3.75ms [1]. The algorithm introduces a very low complexity of 0.5 MIPS. Another standard method is presented in the ANSI standard T1-521-2000 (Appendix B) [7]. This method depends on the well-known linear prediction model in estimating the missing speech waveform. This standard simply adopts the model-based codecs approach. It implements a complete analysis to extract the short- and long-term excitation from the previous correctly received speech. Then, the synthesis unit uses these parameters along with the most recently received speech samples (as initial conditions for the inverse linear prediction (LP) filter) to synthesize an approximation of the missing speech segment. This method introduces an algorithmic delay of 5ms (a half 10ms correct packet) to perform the smoothing transition between the last good speech segment and the beginning of the concealed one. It also requires a much higher complexity (2.3 MIPS for 10ms packet) which is around 5 times the complexity of ITU-T G.711 Appendix A [4], [7]. The resulting concealment quality of this method is comparable to the ITU-T G.711 Appendix A [4], [7]. In this paper, we present a new receiver-based PLC algorithm for packetized PCM-coded speech. It is designed to work with the conventional sampling rate of 8kHz and frame sizes of 10ms. The proposed algorithm does not require any delay and has an affordable complexity of 1.85 MIPS.

The rest of this paper is organized as follows. In Section 2, the concealment model is described. Section 3 presents the quality assessment test for the new method as well as simulation results confirming the improved performance of the proposed algorithm. We then conclude the paper in Section 4, along with the future work that could be added to the proposed method.

Section snippets

Prediction equation

The new LP-based concealment technique is based on the prediction with a sufficiently large-order filter that is capable of accurately modelling the speechS(n)=i=1P(a(i)×S(n−i))+b(n),where S(n) is the nth speech sample, P is the prediction order, which was set to 50 as will be explained later, a(i) are the LP coefficients and b(n) is the residual signal.

As can be seen from Eq. (1) the current speech sample S(n) is composed of two components. The first component is the predictable part carrying

Performance of the proposed algorithm

The new algorithm is compared to the ITU-T standard concealment tool G.711-A and to the packet repetition method. The test was performed on a set of speech files from four speakers; two males and two females referred to in the results as: M1, M2, F1 and F2. Each of those speakers has 10 speech files to investigate, each containing two sentences in English of duration 8s. The format of the files was linear PCM. The files were taken from the ITU-T supplement P.23.

The assessment tool used to

Conclusion and future work

In this paper, we introduced a new concealment algorithm for PCM packetized speech of 10ms packet length. The model implemented in , provides very encouraging results for the idea of combining the pitch prediction along with the high-order LP-based prediction to produce the concealed speech segments. The PESQ-MOS scores obtained for the random loss tests prove that the algorithm exhibits a superior high-quality concealment performance in all cases when compared to an existing commercial method

References (10)

  • Appendix A: a high quality low-complexity algorithm for packet loss concealment with G.711, ITU-T Recommendation....
  • Coding of speech at 8kb/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP), ITU-T...
  • Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3kb/s, ITU-T Recommendation G.723.1,...
  • E. Gunduzhan et al.

    A linear prediction based packet loss concealment algorithm for PCM coded speech

    IEEE Trans. Speech and Audio Process.

    (November 2001)
  • M. Hassan, A. Nayandoro, Internet telephony: services, technical challenges, and products, IEEE Communication Magazine,...
There are more references available in the full text version of this article.

Cited by (11)

  • An effective hybrid low delay packet loss concealment algorithm for MDCT-based audio codec

    2019, Applied Acoustics
    Citation Excerpt :

    Thus, speech codecs must implement a Packet loss concealment (PLC) technique which conceals these frames losses and reduces the degradation in the synthesized audio signal. Most PLC algorithms were developed for speech codecs with time-domain predictive coding [1–4]. They concentrate attentions on digital speech transmission and work well for speech audio signals but yield poor results for audio signals with music.

  • Error-Resilient Coding and Error Concealment Strategies for Audio Communication

    2007, Multimedia over IP and Wireless Networks: Compression, Networking, and Systems
  • Packet loss concealment-based estimation of polynomial interpolation for improving speech quality in VoIP

    2020, International Journal of Intelligent Systems Technologies and Applications
View all citing articles on Scopus
View full text