Low-rate multimode multiband spectral coding of speech

Das, Amitava; Gersho, Allen

doi:10.1007/BF02108647

Low-rate multimode multiband spectral coding of speech

Published: May 1999

Volume 2, pages 317–327, (1999)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Amitava Das¹ &
Allen Gersho²

37 Accesses
2 Citations
Explore all metrics

Abstract

At bit rates of 4 kbps and below, conventional time-domain algorithms such as CELP fail to retain high voice quality and robust performance against background noise as their waveform-matching ability is curtailed by the severely limited codebook space. Spectral coding, on the other hand, offers an effective parametric model, amenable for low-rate implementation. Instead of performing waveform matching, spectral coders preserve only the perceptually important spectral attributes of the speech signal. Spectral coding algorithms encompass a broad family of emerging low-rate speech coding techniques, the common goal being the representation of the short-term spectrum of input speech with a limited set of spectral parameters and the synthesis of the output speech with a set of sinusoids. Pitch, frequency-domain voicing information, and a varying number of spectral magnitudes are the usual parameters of spectral coders. In this paper, we present the enhanced multiband excitation (EMBE) coder as an illustration of this new generation of low-rate spectral coders. The distinguishing features of EMBE are: (a) signaladaptive multimode spectral modeling and parameter quantization, (b) two-band signal-adaptive frequency-domain voicing decision, (c) a novel VQ scheme for the efficient encoding of the variable-dimension spectral magnitude vectors at low-rates, and (d) multi-class selective protection of spectral parameters from channel errors. A 4 kbps implementation of the EMBE spectral coding algorithm with 2.9 kbps source coding and 1.1 kbps for channel coding was specifically designed for satellite-based communication systems, targeting good voice quality at low bit rates and robust performance against channel errors. Fundamental concepts of the EMBE spectral coding algorithm, implementation details, and performance comparisons of the 4 kbps EMBE coder with earlier coders are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Article Open access 21 October 2014

A brief overview of speech enhancement with linear filtering

Article Open access 13 November 2014

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

Article 04 January 2023

References

Almeida, L.B. and Tribolet, J.M. (1982). Harmonic coding: A low bit-rate, good-quality, speech coding technique.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing, pp. 1664–1667.
Campbell, J., Jr. et. al. (1991). The DoD 4.8 Kbps standard (proposed federal standard 1016). In B.S. Atal, V. Cuperman, and A. Gersho (Eds.),Advances in Speech Coding. Kluwer Academic Publishers, pp. 121–133.
Das, A. (1996). Multimode spectral coding of speech at low bit rates. Ph.D. Thesis, ECE dept., University of California, Santa Barbara.
Google Scholar
Das, A. and Gersho, A. (1995a). Multimode spectral coding of speech at 2400 bps and below.Proc. IEEE Speech Coding Workshop. pp. 107–108.
Das, A. and Gersho, A. (1995b). Variable dimension spectral coding of speech at 2.4 kbps and below with phonetic classification.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing, pp. 492–495.
Das, A. and Gersho, A. (1996). Multimode spectral coding of speech for satellite communications.Proc. EUSIPCO-96, Eighth European Signal Processing Conference, vol. 3, pp. 1645–1648.
Google Scholar
Das, A., Paksoy, E., and Gersho, A. (1995). Multimode and variable-rate coding of speech. In W.B. Kleijn and K.K. Paliwal (Eds.),Speech Coding and Synthesis. Elsevier, chap. 7, pp. 257–288.
Das, A., Rao, A., and Gersho, A. (1994a). Variable dimension vector quantization of speech spectra for low-rate vocoders.Proc. IEEE Data Compression Conf. pp. 420–429.
Das, A., Rao, A., and Gersho, A. (1994b). Enhanced multiband excitation of speech at 2.4 kbps with discrete all-pole spectral modeling,Proc. IEEE Globecom Conf. vol. 2, pp. 863–866.
Google Scholar
Das, A., Rao, A., and Gersho, A. (1994c). Enhanced multiband excitation coding of speech at 2.4 kb/s with discrete all-pole spectral modeling.Proc. Globecom Conf. pp. 863–866.
Das, A., Rao, A., and Gersho, A. (1996). Variable dimension vector quantization.IEEE Signal Processing Letters, vol. 3, no. 7, pp. 200–202.
Google Scholar
Digital Voice Systems Inc. (1991). Inmarsat-M voice codec specifications. Technical Report—Version 2 (Feb.).
Gersho, A. (1994). Advances in speech and audio compression.Proc. IEEE, vol. 8, no. 6, pp. 900–918.
Google Scholar
Gersho, A. and Gray, R.M. (1992).Vector Quantization and Signal Compression. Kluwer Academic Publishers.
Griffin, D.W. (1987). Multiband excitation vocoder. Ph.D. Thesis, MIT.
Griffin, D.W. and Lim, J.S. (1986). A high quality 9.6 kbps speech coding system.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing, pp. 125–128.
Hedelin, P. (1981). A tone-oriented voice-excited coder.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing. pp. 205–208.
Lupini, P. and Cuperman, V. (1996). Nonsquare transform vector quantization.IEEE Signal Processing Letters, 3:1–3.
Google Scholar
McAulay, R.J. and Quatieri, T.F. (1986a). Speech analysis/synthesis based on a sinusoidal representation.IEEE Trans. ASSP 34:744–754.
Google Scholar
McAulay, R.J. and Quatieri, T.F. (1986b). Phase modeling and its application to sinusoidal transform coding.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing. pp. 1713–1715.
McAulay, R.J. and Quatieri, T.F. (1992). Low rate speech coding based on the sinusoidal model. In S. Surui and M. Sondhi (Eds.),Advances in Speech Signal Processing. Marcel Dekker Inc. NY, pp. 165–208.
Google Scholar
Meuse, P.C. (1990). A 2400 bps multi-band excitation speech coder.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing. pp. 9–12.
Srinivasan, K. and Gersho, A. (1993). Voice activity detector for digital cellular networks.Proc. IEEE Speech Coding Workshop. pp. 85–86.
Zeger, K. and Gersho, A. (1990). Pseudo-Gray coding.IEEE Trans. on Communication, 38:2147–2158.
Google Scholar

Download references

Author information

Authors and Affiliations

Qualcomm, Inc., 6455, Lusk Boulevard, 92121, San Diego, CA
Amitava Das
Department of Electrical and Computer Engineering, University of California, 93106, Santa Barbara, CA
Allen Gersho

Authors

Amitava Das
View author publications
You can also search for this author in PubMed Google Scholar
Allen Gersho
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, A., Gersho, A. Low-rate multimode multiband spectral coding of speech. Int J Speech Technol 2, 317–327 (1999). https://doi.org/10.1007/BF02108647

Download citation

Received: 21 August 1998
Accepted: 26 November 1998
Issue Date: May 1999
DOI: https://doi.org/10.1007/BF02108647

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-rate multimode multiband spectral coding of speech

Abstract

Access this article

Similar content being viewed by others

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

A brief overview of speech enhancement with linear filtering

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low-rate multimode multiband spectral coding of speech

Abstract

Access this article

Similar content being viewed by others

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

A brief overview of speech enhancement with linear filtering

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation