Receiver-based packet loss concealment for pulse code modulation (PCM G.711) coder

doi:10.1016/j.sigpro.2003.10.021

Signal Processing

Volume 84, Issue 3, March 2004, Pages 663-667

https://doi.org/10.1016/j.sigpro.2003.10.021 Get rights and content

Abstract

This paper introduces a high-performance concealment algorithm for packetized PCM-coded speech as in ITU-T Recommendation G.711. The proposed prediction algorithm implements a combination of linear prediction model and reverse-order replicated pitch period technique as implemented in the ITU-T G.711 Appendix A (ITUT Recommendation G.117, November 2000). The new algorithm is compared to the ITU-T G.711 Appendix A standard and to the commercial tool of packet repetition. It is shown to produce better concealment quality in almost all cases.

Introduction

Voice-over-IP (VoIP), the transmission of packetized voice over IP networks, is gaining much attention as a possible alternative to conventional public switched telephone networks (PSTN). However, impairments present on IP networks, namely jitter, delay and channel errors can lead to the loss of packets at the receiving end. This packet loss degrades the speech quality. Model-based coders, especially G.729-A [2] and G.723.1 [3] International Telecommunication Union (ITU-T) standards, have been extensively used for speech coding over IP networks because of their low bit rates requirements (5.3 to $6.4 kbit/s$ for G.723.1 and $8 kbit/s$ for $G .729 A$ ) and their inherent ability to recover from erasure. Their built-in packet loss concealment makes their quality drop slowly with increasing amount of packet loss. However, their memory requires a few frames for the transition from a concealed state to a correct state. Thus, they actually tend to corrupt a few good packets before recovery as a result of a phenomenon known as “State Error” [6]. On the other hand, pulse code modulation (PCM, $64 kbit/s$ ) [9], although having a higher quality compared to G.729 and G.723.1 in the periods of normal operation, does not have the ability to conceal erasure. This results in a dramatic drop in the quality of speech during loss periods. Yet, PCM-based coders can recover from packet loss more rapidly than model-based coders, since the first speech sample in the first good packet restores speech to its original quality. The low complexity of PCM and its good performance in tandem coding make it a viable alternative to G.729 or G.723.1 for VoIP.

Several approaches have been implemented to address the frame erasure problem in PCM streams. The simplest approach is to play a mute (silence) packet in the erasure period. This method, however, introduces annoying voice clipping and most subjective tests proved that this method deteriorates the speech quality even at very low packet loss rates [4], [5]. Many other concealment algorithms depend on the quasi-stationary property of speech (not a lot of new information is delivered in the duration of a 10– $30 ms$ lost packet). One of the popular commercial concealment algorithms repeats the speech signal received in the last speech packet. This method performs better than silence substitution but its quality is still not satisfactory for high-quality applications.

ITU-T has lately standardized (in G.711 Appendix A [1]) a high-quality low-complexity PCM-coded speech concealment method. This method depends on waveform substitution. The packet loss concealment (PLC) algorithm first performs pitch detection on a sufficient length of speech samples kept in the history buffer (390 samples of $8 kHz$ -sampled speech). The concealment unit then places the pointer one pitch period backward and copies a speech signal of the duration of the lost packet. This pitch predicted replica is played in the gap resulting from the missing speech segment. The algorithm also performs an overlap and add at the transition between the last received good samples and the concealed ones. This overlap and add is to ensure a smooth and natural transition and higher quality for the resulting concealment. However, this results in an added algorithmic delay of $3.75 ms$ [1]. The algorithm introduces a very low complexity of 0.5 MIPS. Another standard method is presented in the ANSI standard T1-521-2000 (Appendix B) [7]. This method depends on the well-known linear prediction model in estimating the missing speech waveform. This standard simply adopts the model-based codecs approach. It implements a complete analysis to extract the short- and long-term excitation from the previous correctly received speech. Then, the synthesis unit uses these parameters along with the most recently received speech samples (as initial conditions for the inverse linear prediction (LP) filter) to synthesize an approximation of the missing speech segment. This method introduces an algorithmic delay of $5 ms$ (a half $10 ms$ correct packet) to perform the smoothing transition between the last good speech segment and the beginning of the concealed one. It also requires a much higher complexity (2.3 MIPS for $10 ms$ packet) which is around 5 times the complexity of ITU-T G.711 Appendix A [4], [7]. The resulting concealment quality of this method is comparable to the ITU-T G.711 Appendix A [4], [7]. In this paper, we present a new receiver-based PLC algorithm for packetized PCM-coded speech. It is designed to work with the conventional sampling rate of $8 kHz$ and frame sizes of $10 ms$ . The proposed algorithm does not require any delay and has an affordable complexity of 1.85 MIPS.

The rest of this paper is organized as follows. In Section 2, the concealment model is described. Section 3 presents the quality assessment test for the new method as well as simulation results confirming the improved performance of the proposed algorithm. We then conclude the paper in Section 4, along with the future work that could be added to the proposed method.

Section snippets

Prediction equation

The new LP-based concealment technique is based on the prediction with a sufficiently large-order filter that is capable of accurately modelling the speech $S(n)= ∑ i=1 P (a(i)×S(n−i))+b(n),$ where S(n) is the nth speech sample, P is the prediction order, which was set to 50 as will be explained later, a(i) are the LP coefficients and b(n) is the residual signal.

As can be seen from Eq. (1) the current speech sample S(n) is composed of two components. The first component is the predictable part carrying

Performance of the proposed algorithm

The new algorithm is compared to the ITU-T standard concealment tool G.711-A and to the packet repetition method. The test was performed on a set of speech files from four speakers; two males and two females referred to in the results as: M1, M2, F1 and F2. Each of those speakers has 10 speech files to investigate, each containing two sentences in English of duration $8 s$ . The format of the files was linear PCM. The files were taken from the ITU-T supplement P.23.

The assessment tool used to

Conclusion and future work

In this paper, we introduced a new concealment algorithm for PCM packetized speech of $10 ms$ packet length. The model implemented in , provides very encouraging results for the idea of combining the pitch prediction along with the high-order LP-based prediction to produce the concealed speech segments. The PESQ-MOS scores obtained for the random loss tests prove that the algorithm exhibits a superior high-quality concealment performance in all cases when compared to an existing commercial method

References (10)

Appendix A: a high quality low-complexity algorithm for packet loss concealment with G.711, ITU-T Recommendation....
Coding of speech at 8kb/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP), ITU-T...
Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3kb/s, ITU-T Recommendation G.723.1,...
E. Gunduzhan et al.
A linear prediction based packet loss concealment algorithm for PCM coded speech
IEEE Trans. Speech and Audio Process.
(November 2001)
M. Hassan, A. Nayandoro, Internet telephony: services, technical challenges, and products, IEEE Communication Magazine,...

There are more references available in the full text version of this article.

Cited by (11)

An effective hybrid low delay packet loss concealment algorithm for MDCT-based audio codec
2019, Applied Acoustics
Citation Excerpt :
Thus, speech codecs must implement a Packet loss concealment (PLC) technique which conceals these frames losses and reduces the degradation in the synthesized audio signal. Most PLC algorithms were developed for speech codecs with time-domain predictive coding [1–4]. They concentrate attentions on digital speech transmission and work well for speech audio signals but yield poor results for audio signals with music.
This paper proposes a hybrid packet loss concealment (PLC) algorithm for the MDCT-based audio codec with different PLC strategies on tone dominant source signals and noise like signals respectively. It is meaningful to find that the phase angle of the MDCT-MDST coefficients decreases linearly with the increase of the frame index but the amplitude keeps unchanged for the stationary source signal with dominant tonal components. Therefore an efficient frame interpolation method is designed to accurately estimate the phase angle and the magnitude of the MDCT-MDST coefficients of the lost frame. For the noise-like signals without overwhelming tonal components, a modified shaped-noise insertion is proposed to improve the audio perception. Both objective and subjective test results show that the proposed algorithm provides better performance than the existing ones for both music and voiced speech signals.
Error-Resilient Coding and Error Concealment Strategies for Audio Communication
2007, Multimedia over IP and Wireless Networks: Compression, Networking, and Systems
This chapter discusses the error-resilient coding and error-concealment strategies for audio communication. It explores some of the basic techniques that are used in concealing packet losses and applied to several kinds of codecs, including frame-independent codecs, overlapped transform codecs, and fully predictive codecs. It explores some of the techniques incorporated into international standards and a few additional techniques. It reveals that many codecs are available and can be used for specific application. The particular choice of a codec will generally involve system design issues. Commercial considerations often play a major role as well. When a frame is lost, the decoder will take four specific actions to conceal the loss: (1) repeat the synthesis parameter filters; (2) attenuate the adaptive and fixed codebook gains; (3) generate the replacement excitation; and (4) attenuate the memory of the gain predictor. The main objective of this chapter is to look at the different techniques available that are particularly important for speech communication.
Error-Resilient Coding and Error Concealment Strategies for Audio Communication
2007, Multimedia over IP and Wireless Networks
This chapter discusses the error-resilient coding and error-concealment strategies for audio communication. It explores some of the basic techniques that are used in concealing packet losses and applied to several kinds of codecs, including frame-independent codecs, overlapped transform codecs, and fully predictive codecs. It explores some of the techniques incorporated into international standards and a few additional techniques. It reveals that many codecs are available and can be used for specific application. The particular choice of a codec will generally involve system design issues. Commercial considerations often play a major role as well. When a frame is lost, the decoder will take four specific actions to conceal the loss: (1) repeat the synthesis parameter filters; (2) attenuate the adaptive and fixed codebook gains; (3) generate the replacement excitation; and (4) attenuate the memory of the gain predictor. The main objective of this chapter is to look at the different techniques available that are particularly important for speech communication.
A New Regression Model Applied for Packet Loss Recovery Technique in Real-Time Communications
2024, Przeglad Elektrotechniczny
Packet Loss Concealment Based on Phase Correction and Deep Neural Network
2022, Applied Sciences (Switzerland)
Packet loss concealment-based estimation of polynomial interpolation for improving speech quality in VoIP
2020, International Journal of Intelligent Systems Technologies and Applications

View all citing articles on Scopus

View full text

Fast CommunicationReceiver-based packet loss concealment for pulse code modulation (PCM G.711) coder

Abstract

Introduction

Section snippets

Prediction equation

Performance of the proposed algorithm

Conclusion and future work

A linear prediction based packet loss concealment algorithm for PCM coded speech

IEEE Trans. Speech and Audio Process.

Fast Communication
Receiver-based packet loss concealment for pulse code modulation (PCM G.711) coder