A new error concealment method for consecutive frame loss based on CELP speech

https://doi.org/10.1016/j.compeleceng.2010.03.002Get rights and content

Abstract

Lower bit-rate speech coding by digital signal processing becomes more and more important with the development of communication technology. Speech codec should keep good quality in various conditions such as diverse channel, different speakers and background noises. When transmission environment is poor and the channel coding could not effectively control error occurrences, error concealment will be applied. Generally speaking, error concealment is based on extrapolation method or repetition method in which the speech coding parameters are extrapolated or repeated from the parameters of the surrounding good frame received. This paper focuses on speech coding standard Adaptive Multi-Rate (AMR) and two points are discussed: the value of pitch lag when consecutive frames are lost and the recovery of codebook gain for good frames after continuous bad frames. Objective and subjective experimental results confirm that the proposed algorithm could achieve better speech quality.

Introduction

Transmission over networks of real-time information is increasingly popular. The compressed audio bitstream which had been corrupted by transmission errors could result in very annoying artifacts. In order to counter the errors in transmission, error concealment makes use of inherent characteristics of the data such as spatial or temporal correlations and attempts to obtain a close approximation of the original signal [1].

In audio domain, the concealment methods are basically by extrapolation or by repetition.

The extrapolation method extrapolates the parameters from neighboring packets to produce the replacement for the lost packet. Waveform substitution, pitch waveform replication, time scale modification, spectral interpolation can be also classified into this kind. A waveform similarity overlap-add technique was used for time scale modification to improve the speech quality under a variety of frame erasure conditions [2]. A method using the windowed previous frame of the lost one and the residual samples to synthesize the new frame was proposed in Ref. [3] and it also exploited the overlap-add mechanism. In VoIP applications, the perceptual quality improves with interpolation from both the “previous” and “next” correctly received intraframe [4].

The repetition method also makes use of surrounding frames to derive the codec parameters according to the codec algorithm. Traditionally, model is implemented to generate these parameters. Interpolation carried out via AR-based extrapolation of the segments that surround the missing part is described in Refs. [5], [6], [7]. Instead of performing pure extrapolation, which consists basically of setting an initial condition for the AR synthesis filter and computing its unforced response, a new scheme based on an earlier proposition, which employs autoregressive-based signal extrapolation is proposed. In Refs. [8], [9], the missing data reconstruction is based on the assumption that signal can be modeled as P order autoregressive process. The main drawback of autoregressive model-based approach is an increasing model order for long sequence reconstruction. The model order must be two or three times the length of missing data sequence. This method is not very appropriate for reconstructing long sequence. In Ref. [10], the linear prediction method has been extended to the nonlinear case using neural networks. Some other sinusoidal model-based error concealment preserves the spectral content of the signal, the relevant research are presented in Refs. [11], [12], [13].

In the G.722 codec, the pitch value of a previous correctly received frame will be used for the lost frame, but the loss of a frame also desynchronizes the ADPCM decoders, resulting in a state mismatch between the encoders and decoders. Serizawa and Nozawa [14] proposed as a scheme to update the internal state parameters. Though it avoided the annoying effect of clicks, the forgetting factor control tends to damp the gains of the decoded speech and degraded speech quality.

G.729 repeats the spectral parameters of the last received good frame to the erased frame. The adaptive codebook gain and fixed codebook gain are obtained by multiplying predefined attenuation factors by the gains of the previous frame. The pitch lag is increased by one to the value of the previous frame to avoid excessive periodicity [15].

AMR speech codec is originally standardized by ETSI for the GSM system and later selected by 3GPP as mandatory speech codec for the 3rd generation WCDMA systems. AMR include narrowband AMR (AMR-NB) and wideband AMR (AMR–WB). Narrowband AMR contains eight codec modes with different source bit rates, from 12.2 kbps down to 4.75 kbps and a low rate background noise encoding mode [16]. AMR-WB contains nine different codec modes with source bit rates from 6.6 kbps up to 23.85 kbps [17]. The service of AMR also comprises a flexible solution where the relation between speech coding and channel coding could be balanced according to the channel conditions currently estimated on the radio interface.

When missing packets occur at the receiver, the decoder applies concealment and a set of predicted parameters are used in the speech synthesis. But when consecutive bad frames are received, we find that the mechanism of AMR could not achieve satisfactory effect. By modifying the algorithm, the speech quality could be improved.

The rest of this paper is organized as follows. In Section 2, the AMR error concealment algorithm is briefly reviewed and the shortcomings are presented. Section 3 proposes the new error concealment method based on the disadvantages. Experiment and evaluation results are shown in Section 4 and conclusions are presented in Section 5.

Section snippets

AMR- error concealment

When speech frames are received, each of them has the RX_TYPE information which will be used to specify the contents of the frame [18]. Different RX_TYPEs which could be mainly classified as speech or SID (Silence Descriptor) related. They are shown in Table 1.

When bad frames are received, the AMR codec will treat the different RX_TYPEs in separate ways. The network shall indicate lost speech or lost SID frames by setting the RX_TYPE values to SPEECH_BAD or SID_BAD. If these flags are set, the

Pitch lag adjustment

As described above, the key problem is that how to deal with the case of continuous bad frames. We can set the pitch lag fluctuated around the value of the last correct frame instead of continuous increase when consecutive error frames arrived.

We use Variable bfi_count to record the number of sequential bad frames. When a good frame is received, it will be reset to zero. Variable old_T0 is used to record the integral part of the pitch lag of the last frame. When frame error occurs, use the

Experiment

We evaluated the new algorithm by both objective and subjective tests. The speeches are encoded at the bit rate of 48 kbps. Different packet loss patterns are simulated and the frame errors rates are 2%, 5%, and 10% separately. The objective test of audio quality is based on the score of PESQ. Table 2 provides the average PESQ score of the 12 test sequences under each frame error rate.

According the objective test, the proposed method could achieve higher score, which means that it is more robust

Conclusion

In this paper, we propose the new error concealment method based on AMR which counteracts speech quality decrease. Two techniques are discussed. Firstly, the value of pitch lag is adjusted to fluctuate instead to increase all along. This could avoid excessive periodicity which may bring annoying sound when consecutive error frames are received. Secondly, when continuous bad frames end and the good frames are received, the codebook gains will cause the energy undulate. The coefficient is added

Jie Yang was born in 1981. He received the B.S. degree from Huazhong University of Science and Technology (HUST) in 2004 and he is a Ph.D. student studying in HUST now. His main interests include signal processing, speech and video compression.

References (18)

  • Yao Wang, Qinfan Zhu. Error control and concealment for video communication: an overview. In: Proceedings of the IEEE,...
  • Merazka F. Packet loss concealment using time scale modification for CELP based coders in packet network. In: 40th...
  • Huan Hou, Weibei Dou, Real-time audio error concealment method based on sinusoidal model. In: International conference...
  • Jian Wang, Gibson, JD. Performance comparison of intraframe and interframe LSF quantization in packet networks. In:...
  • I. Kauppinen et al.

    A method for long extrapolation of audio signals

    J Audio Eng Soc

    (2001)
  • I. Kauppinen et al.

    Reconstruction method for missing or damaged long portions in audio signal

    J Audio Eng Soc

    (2002)
  • Kauppinen I, Roth K. Audio signal extrapolation – theory and applications. In: Proceedings of the 5th international...
  • Vaseghi SV, Rayner, PJW. Detection and suppression of impulsive noise in speech communication system. In: Proceedings...
  • S.V. Vaseghi et al.

    Restoration of old gramophone recordings

    J Audio Eng Soc

    (1992)
There are more references available in the full text version of this article.

Cited by (5)

  • Multiple description coding technique to improve the robustness of ACELP based coders AMR-WB

    2019, Speech Communication
    Citation Excerpt :

    The authors have applied their proposed MDC method in the lower bit rates. Also, the adaptive multi-rate (AMR) speech coding standard based on CELP speech was introduced by Yang et al. (2010). This strategy is based on error concealment which is applied to consecutive frame loss when transmission environment is not reliable and the channel coding could not effectively control error occurrence.

  • Combining pulse-based features for rejecting far-field speech in a HMM-based Voice Activity Detector

    2011, Computers and Electrical Engineering
    Citation Excerpt :

    In [1] a spectrum sensing scheme to detect the presence of the primary user for cognitive radio systems is proposed (very similar to the VAD proposed in this paper) being able to distinguish between main speaker speech and far-field speech. Moreover, the system implemented in [1] uses one-order feature detection and compare its results with an energy detector showing relevant improvement. In our work, a comparative study is done too, comparing our proposal to other well known VADs: AURORA(FD), AMR1, AMR2 or G729 annex b. Another recent work is [2], where authors use the pitch lag as feature to achieve better speech quality in the AMR codec.

  • A packet loss recovery of G.729 speech under severe packet loss condition

    2012, 2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012

Jie Yang was born in 1981. He received the B.S. degree from Huazhong University of Science and Technology (HUST) in 2004 and he is a Ph.D. student studying in HUST now. His main interests include signal processing, speech and video compression.

Shengsheng Yu was born in 1944, and received the B.E. degree in 1967. He is a Professor and doctor advisor at Huazhong University of Science and Technology. He had been a visiting scholar in west Germany from 1982 to 1983. His main field of research: computer network and storage, discrete signal processing and communication.

Jingli Zhou was born in 1946. She received the B.E. degree in 1969. She is a Professor and doctor advisor at Huazhong University of Science and Technology. She had been a visiting scholar in USA from 1995 to 1996 and has been honor of the State Department Special Allowance since 1999. Her main field of research: computer network and multimedia signal processing.

Yi Gao was born in 1984. He received the B.S. degree from HUST in 2005, where he is now pursuing his Ph.D. degree in computer science. His main interests are wavelet theory, image processing.

Reviews processed and proposed for publication to the Editor-in-Chief by Associate Editor Dr. M Malek.

View full text