Elsevier

Signal Processing

Volume 83, Issue 8, August 2003, Pages 1707-1719
Signal Processing

On artificial bandwidth extension of telephone speech

https://doi.org/10.1016/S0165-1684(03)00082-3Get rights and content

Abstract

We present a signal processing algorithm to convert speech signals with “standard telephone” quality into 7kHz wideband speech. A statistical approach based on a hidden Markov model (HMM) is used, which takes into account several features of the band-limited speech. The narrowband input signal is classified into a limited number of speech sounds (HMM states) for which the wideband spectral envelope is estimated using the pre-trained model. The enhanced speech exhibits a significantly improved quality without objectionable artifacts.

Introduction

The limited acoustic bandwidth of today's public telephone networks originates from the former analogue transmission techniques. The limitation to a frequency range of about 0.3–3.4kHz causes the typical sound of the narrowband telephone speech. In the transition to digital transmission, the upper frequency limit of 3.4kHz has been retained (passband up to 3.4kHz, sampling frequency fs=8kHz), whereas the lower frequency limit may be somewhat below 300Hz [14].

Listening experiments have shown that the acoustic bandwidth of speech signals contributes significantly to the perceived speech quality [21], [39], which is measured in terms of the mean opinion score (MOS). In comparison to telephone speech, typical wideband speech with a frequency range of 50Hz7kHz yields a considerable gain of up to about 1.3 MOS points.

Although the sentence intelligibility of clean telephone speech is about 99%, the intelligibility of meaningless syllables is roughly 90%, only. As a result, we sometimes need a spelling alphabet to communicate words that cannot be understood from the context, such as unknown names. Improving the intelligibility of syllables makes the communication more comfortable and less strenuous in many cases, i.e., the listening effort can be reduced.

True digital wideband speech communication can be achieved by redesigning the transmission link, i.e., by introducing new speech codecs on both sides of the link. Actually, several wideband speech coding schemes have been developed for the increased acoustic bandwidth (50Hz7kHz). Already in the 1980s the G.722 codec was standardized for teleconferencing and ISDN telephony [15]. As yet this codec has not found widespread introduction into ISDN. Recently, the so-called adaptive multi-rate wideband (AMR-WB) speech codec was developed and standardized for mobile radio systems such as GSM and UMTS [11]. For the future the gradual introduction of wideband terminals can be expected. However, for a long transitional period mixed telephone networks with both narrowband and wideband terminals will exist due to economical reasons.

An approach to enhance the perceived acoustic bandwidth based on the information from the available narrowband speech is artificial bandwidth extension (BWE) [4], [5], [6], [7], [29], [36] at the receiving end. The problem of BWE is illustrated in Fig. 1: the original wideband (wb) signal swb is band-pass filtered prior to analogue-to-digital conversion and transmission over the telephone network. At the receiving terminal only the narrowband (nb) signal snb is available. By artificial bandwidth extension an estimate s̃wb of the wideband speech is produced by adding some artificial low- and/or high-frequency signal components. Although true wideband speech quality cannot be obtained by artificial bandwidth extension, BWE represents a very attractive enhancement of any receiving wideband terminal as long as there are sending narrowband terminals in the network. In this paper the bandwidth extension of speech signals towards higher frequencies is addressed. The high-frequency band will be called the extension band (eb) in the following.

The following conventions are used to denote quantities: capital bold letters refer to matrices, e.g., A, bold letters refer to vectors, e.g., a, and scalars are not bold, e.g., a. Estimated quantities are labeled with a tilde, e.g., ỹ, quantized variables are marked by a hat, e.g., ŷ, and mean values are labeled by a bar, e.g., ȳ.

Section snippets

Bandwidth extension algorithm

The key point of the bandwidth extension algorithm is to exploit implicit redundancy of the speech production process as proposed in the pioneering approaches [4], [5], [16]. The linear source-filter model of speech, widely used in speech coding and recognition, consists of an auto-regressive (AR) filter (corresponding to the vocal tract) and a source producing a spectrally flat excitation (cf. Fig. 2). According to this model the algorithm for bandwidth extension is divided into two tasks,

Extension of the excitation signal

According to the simplifying linear model of speech production the excitation signal u(k) is spectrally flat: for voiced sounds it contains sinusoids at multiples of the fundamental (pitch) frequency of the speech segment where all harmonics have almost the same amplitude; during unvoiced sounds the excitation is more or less white noise.

Due to these properties the missing high-frequency components of the excitation signal can be produced by modulation, i.e., by a frequency shift by ΩM [4], [10]

Extension of the spectral envelope

The procedure of estimating the wideband spectral envelope, i.e. the AR coefficient set ãwb, is related to pattern recognition techniques. We use true wideband speech signals in a training phase and narrowband signals during the application phase.

In our algorithm the estimated wideband spectral envelope is utilized both in the analysis filter to estimate the narrowband excitation signal ũnb(k) and in the synthesis filter for spectral shaping of the extended excitation signal. Hence, the

Performance evaluation

Different modeling and estimation methods have been evaluated both by instrumental performance measures and by informal listening tests. Starting from typical “telephone speech” with frequency components between 300Hz and 3.4kHz, the extension of high-frequency components above 3.4kHz was investigated.

The statistical model was trained with diverse parameterizations and a 15-dimensional composite feature vector x (see Section 4.3). The complexity of the HMM was varied between NS=2,…,64 states,

Discussion

In this paper an algorithm for artificial bandwidth extension has been proposed that is based on a linear source-filter model of the speech signal. According to the two-stage structure of the source-filter model, the bandwidth extension algorithm is divided into two sub-systems that are mutually independent to a large extent [4]. The BWE algorithm proposed in the paper inherently guarantees transparency of the system with respect to the narrowband input signal.

The principal part of the

Acknowledgements

The authors would like to thank the Siemens AG, Mobile Phones for supporting this project and for providing access to the BAS SI100 speech corpus.

References (39)

  • A.M.A. Ali et al.

    Acoustic–phonetic features for the automatic classification of fricatives

    J. Acoust. Soc. Amer.

    (May 2001)
  • C. Avendano, H. Hermansky, E.A. Wan, Beyond Nyquist: towards the recovery of broad-bandwidth speech from...
  • L.R. Bahl et al.

    Optimal decoding of linear codes for minimizing symbol error rate

    IEEE Trans. Inform. Theory

    (March 1974)
  • H. Carl, Untersuchung verschiedener Methoden der Sprachkodierung und eine Anwendung zur Bandbreitenvergrößerung von...
  • H. Carl, U. Heute, Bandwidth enhancement of narrow-band speech signals, in: Proceedings of the EUSIPCO, Vol. 2,...
  • Y.M. Cheng et al.

    Statistical recovery of wideband speech from narrowband speech

    IEEE Trans. Speech Audio Process.

    (October 1994)
  • M.G. Croll, Sound-quality improvement of broadcast telephone calls, Technical Report 1972/26, The British Broadcasting...
  • J. Epps, W.H. Holmes, A new technique for wideband enhancement of coded narrowband speech, in: Proceedings of the IEEE...
  • T. Fingscheidt, Softbit-Sprachdecodierung in digitalen Mobilfunksystemen, Ph.D. Thesis, Aachen University (RWTH); P....
  • J.A. Fuemmeler et al.

    Techniques for the regeneration of wideband speech from narrowband speech

    EURASIP J. Appl. Signal Process.

    (December 2001)
  • 3GPP TS 26.171, AMR wideband speech codec; general description,...
  • R. Hagen, Spectral quantization of cepstral coefficients, in: Proceedings of the ICASSP, Vol. 1, Adelaide, Australia,...
  • D.A. Heide, G.S. Kang, Speech enhancement for bandlimited speech, in: Proceedings of the ICASSP, Vol. 1, Seattle, WA,...
  • ITU-T Rec. G.712, Performance characteristics of PCM channels between 4-wire interfaces at voice frequencies,...
  • ITU-T Rec. G.722, 7 khz audio coding within 64 kbit/s,...
  • V. Iyengar, R. Rabipour, P. Mermelstein, B.R. Shelton, Speech bandwidth extension method and apparatus, U.S. patent no....
  • P. Jax, Enhancement of bandlimited speech signals: algorithms and theoretical bounds, Ph.D. Thesis, Aachen University...
  • P. Jax, P. Vary, Wideband extension of telephone speech using a hidden Markov model, in: Proceedings of the IEEE...
  • P. Jax, P. Vary, An upper bound on the quality of artificial bandwidth, extension of narrowband speech signals, in:...
  • Cited by (168)

    • Deep neural network ensemble for reducing artificial noise in bandwidth extension

      2020, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      BWE without side information is called blind BWE or stand-alone BWE and can be further divided into two categories. One is based on a speech production model using linear predictive coding (LPC) coefficients [4], and the other directly estimates the HB spectra in the frequency domain using extrapolation [5] or machine learning [6]. Especially, deep neural network (DNN) regression model-based BWE algorithms that can estimate the log power magnitudes of the high-frequency band were first proposed in [7].

    • VoiceListener: A Training-free and Universal Eavesdropping Attack on Built-in Speakers of Mobile Devices

      2023, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    • Speech Bandwidth Enhancement Based on Spectral-Domain Approach

      2023, 2023 International Conference on Computational Intelligence, Networks and Security, ICCINS 2023
    • Digital Speech Transmission and Enhancement, Second edition

      2023, Digital Speech Transmission and Enhancement, Second edition
    View all citing articles on Scopus
    View full text