Skip to main content
Log in

Low-rate multimode multiband spectral coding of speech

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

At bit rates of 4 kbps and below, conventional time-domain algorithms such as CELP fail to retain high voice quality and robust performance against background noise as their waveform-matching ability is curtailed by the severely limited codebook space. Spectral coding, on the other hand, offers an effective parametric model, amenable for low-rate implementation. Instead of performing waveform matching, spectral coders preserve only the perceptually important spectral attributes of the speech signal. Spectral coding algorithms encompass a broad family of emerging low-rate speech coding techniques, the common goal being the representation of the short-term spectrum of input speech with a limited set of spectral parameters and the synthesis of the output speech with a set of sinusoids. Pitch, frequency-domain voicing information, and a varying number of spectral magnitudes are the usual parameters of spectral coders. In this paper, we present the enhanced multiband excitation (EMBE) coder as an illustration of this new generation of low-rate spectral coders. The distinguishing features of EMBE are: (a) signaladaptive multimode spectral modeling and parameter quantization, (b) two-band signal-adaptive frequency-domain voicing decision, (c) a novel VQ scheme for the efficient encoding of the variable-dimension spectral magnitude vectors at low-rates, and (d) multi-class selective protection of spectral parameters from channel errors. A 4 kbps implementation of the EMBE spectral coding algorithm with 2.9 kbps source coding and 1.1 kbps for channel coding was specifically designed for satellite-based communication systems, targeting good voice quality at low bit rates and robust performance against channel errors. Fundamental concepts of the EMBE spectral coding algorithm, implementation details, and performance comparisons of the 4 kbps EMBE coder with earlier coders are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Almeida, L.B. and Tribolet, J.M. (1982). Harmonic coding: A low bit-rate, good-quality, speech coding technique.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing, pp. 1664–1667.

  • Campbell, J., Jr. et. al. (1991). The DoD 4.8 Kbps standard (proposed federal standard 1016). In B.S. Atal, V. Cuperman, and A. Gersho (Eds.),Advances in Speech Coding. Kluwer Academic Publishers, pp. 121–133.

  • Das, A. (1996). Multimode spectral coding of speech at low bit rates. Ph.D. Thesis, ECE dept., University of California, Santa Barbara.

    Google Scholar 

  • Das, A. and Gersho, A. (1995a). Multimode spectral coding of speech at 2400 bps and below.Proc. IEEE Speech Coding Workshop. pp. 107–108.

  • Das, A. and Gersho, A. (1995b). Variable dimension spectral coding of speech at 2.4 kbps and below with phonetic classification.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing, pp. 492–495.

  • Das, A. and Gersho, A. (1996). Multimode spectral coding of speech for satellite communications.Proc. EUSIPCO-96, Eighth European Signal Processing Conference, vol. 3, pp. 1645–1648.

    Google Scholar 

  • Das, A., Paksoy, E., and Gersho, A. (1995). Multimode and variable-rate coding of speech. In W.B. Kleijn and K.K. Paliwal (Eds.),Speech Coding and Synthesis. Elsevier, chap. 7, pp. 257–288.

  • Das, A., Rao, A., and Gersho, A. (1994a). Variable dimension vector quantization of speech spectra for low-rate vocoders.Proc. IEEE Data Compression Conf. pp. 420–429.

  • Das, A., Rao, A., and Gersho, A. (1994b). Enhanced multiband excitation of speech at 2.4 kbps with discrete all-pole spectral modeling,Proc. IEEE Globecom Conf. vol. 2, pp. 863–866.

    Google Scholar 

  • Das, A., Rao, A., and Gersho, A. (1994c). Enhanced multiband excitation coding of speech at 2.4 kb/s with discrete all-pole spectral modeling.Proc. Globecom Conf. pp. 863–866.

  • Das, A., Rao, A., and Gersho, A. (1996). Variable dimension vector quantization.IEEE Signal Processing Letters, vol. 3, no. 7, pp. 200–202.

    Google Scholar 

  • Digital Voice Systems Inc. (1991). Inmarsat-M voice codec specifications. Technical Report—Version 2 (Feb.).

  • Gersho, A. (1994). Advances in speech and audio compression.Proc. IEEE, vol. 8, no. 6, pp. 900–918.

    Google Scholar 

  • Gersho, A. and Gray, R.M. (1992).Vector Quantization and Signal Compression. Kluwer Academic Publishers.

  • Griffin, D.W. (1987). Multiband excitation vocoder. Ph.D. Thesis, MIT.

  • Griffin, D.W. and Lim, J.S. (1986). A high quality 9.6 kbps speech coding system.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing, pp. 125–128.

  • Hedelin, P. (1981). A tone-oriented voice-excited coder.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing. pp. 205–208.

  • Lupini, P. and Cuperman, V. (1996). Nonsquare transform vector quantization.IEEE Signal Processing Letters, 3:1–3.

    Google Scholar 

  • McAulay, R.J. and Quatieri, T.F. (1986a). Speech analysis/synthesis based on a sinusoidal representation.IEEE Trans. ASSP 34:744–754.

    Google Scholar 

  • McAulay, R.J. and Quatieri, T.F. (1986b). Phase modeling and its application to sinusoidal transform coding.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing. pp. 1713–1715.

  • McAulay, R.J. and Quatieri, T.F. (1992). Low rate speech coding based on the sinusoidal model. In S. Surui and M. Sondhi (Eds.),Advances in Speech Signal Processing. Marcel Dekker Inc. NY, pp. 165–208.

    Google Scholar 

  • Meuse, P.C. (1990). A 2400 bps multi-band excitation speech coder.Proc. IEEE Intl. Conf. on Accoust. Speech Signal Processing. pp. 9–12.

  • Srinivasan, K. and Gersho, A. (1993). Voice activity detector for digital cellular networks.Proc. IEEE Speech Coding Workshop. pp. 85–86.

  • Zeger, K. and Gersho, A. (1990). Pseudo-Gray coding.IEEE Trans. on Communication, 38:2147–2158.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, A., Gersho, A. Low-rate multimode multiband spectral coding of speech. Int J Speech Technol 2, 317–327 (1999). https://doi.org/10.1007/BF02108647

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02108647

Keywords

Navigation