Elsevier

Advances in Computers

Volume 31, 1990, Pages 99-173
Advances in Computers

Perceptual Models for Automatic Speech Recognition Systems

https://doi.org/10.1016/S0065-2458(08)60153-9Get rights and content

Publisher Summary

The research on automatic speech recognition aims to give the machine capabilities similar to humans to communicate in natural spoken languages, and such research is of great interest from both the application and the research point of view. This chapter discusses the fundamentals of speech production and speech knowledge, numerous techniques used in speech recognition systems, some successful speech recognition systems, and some recent advances in speech recognition research, such as the application of artificial neural network models and a special case of Hidden Markov models. The problem of speech recognition is approached in two ways: using models based on speech production, and using models based on speech perception. The chapter illustrates a combination of an ear model and multi-layer networks that makes possible an effective generalization among speakers in coding vowels. In addition, it also suggests that the use of speech knowledge organized as morphological properties is robust enough to handle inter- and intra-speaker variations. By learning the ways to allocate the degrees of evidence to articulatory features, it is possible to estimate normalized values for the place and manner of articulation, which appear to be highly consistent with qualitative expectations based on speech knowledge. The effective learning and good generalizations can be obtained using a limited number of speakers, in analogy with what humans do. Speech coders that create degrees of evidence of phonetic features can be used for fast lexical access, to recognize phonemes in new languages with limited training, to constrain the search for the interpretation of a sentence.

References (86)

  • C. Scagliola

    Continuous speech recognition without segmentation: Two ways of using diphones as basic speech units

    Speech Commun

    (1983)
  • S. Seneff

    A joint synchrony/mean-rate model of auditory speech processing

    J. Phonetics

    (1988)
  • B. Aldefeld et al.

    A minimum distance search technique and its application to automatic directory assistance

    Bell System Tech. J.

    (1980)
  • A. Averbuch

    Experiments with the Tangora 20,000-word Speech Recognizer.

    Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Dallas

    (1987)
  • L.R. Bahl et al.

    Automatic recognition of continuously spoken sentences from a finite state grammar

    Proc. IEEE Int. Conf. Acoustics; Speech, and Signal Processing, Washington, D.C.

    (1979)
  • L.R. Bahl et al.

    A maximum likelihood approach to continuous speech recognition

    IEEE Trans. Pattern Anal. Machine Intell

    (1983)
  • J.K. Baker

    The DRAGON system—An overview

    IEEE Trans. Acoustics, Speech, and Signal Processing

    (1975)
  • J.K. Baker

    Stochastic modeling for automatic speech understanding

  • L.E. Baum

    An inequality and associated maximization technique in the statistical estimation for probabilistic functions of Markov processes

    Inequalities

    (1972)
  • Y. Bengio et al.

    Use of Neural Networks for the Recognition of the Place of Articulation

    Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, New York

    (1988)
  • H. Bourlard et al.

    Multilayer perception and automatic speech recognition

    IEEE First Int. Conf. Neural Networks, San Diego

    (1987)
  • H. Bourlard et al.

    Connected digit recognition using vector quantization

    Proc. Int. Conf. Acoustics, Speech, and Signal Processing, San Diego

    (1984)
  • Chen, F. R. (1980). Acoustic-Phonetic Constraints in Continuous Speech Recognition: A Case Study Using the Digit...
  • P.S. Cohen et al.

    The phonological component of an automatic speech recognition system

  • B. Delgutte

    Representation of speech-like sounds in the discharge patterns of auditorynerve fibers

    J. Acoustical Society of America

    (1980)
  • B. Delgutte et al.

    Speech coding in the auditory nerve: I. Vowel-like sounds

    J. Acoustical Society of America

    (1984)
  • B. Delgutte et al.

    Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds

    J. Acoustical Society America

    (1984)
  • B. Delgutte et al.

    Speech coding in the auditory nerve: III. Voiceless fricative consonants

    J. Acoustical Society of America

    (1984)
  • B. Delgutte et al.

    Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics

    J. Acoustical Society America

    (1984)
  • R. De Mori

    A descriptive technique for automatic speech recognition

    IEEE Trans. Audio Electroacoust

    (1973)
  • R. De Mori et al.

    Computer Recognition of Speech

    Handbook of Pattern Recognition and Image Processing

    (1986)
  • R. De Mori et al.

    Parallel algorithms for syllable recognition in continuous speech

    IEEE Trans. Pattern Anal. Machine Intell

    (1985)
  • R. De Mori et al.

    Learning and plan refinement in a knowledge-based system for automatic speech recognition

    IEEE Trans. Pattern Anal. Machine Intell

    (1987)
  • R. De Mori et al.

    Use of procedural knowledge for automatic speech recognition

    Proc. Tenth Int. Joint Conf. Artificial Intelligence, Milan

    (1987)
  • L.D. Erman et al.

    The HEARSAY-I speech understanding system: An example of the recognition process

    IEEE Trans. Comput

    (1976)
  • G. Fant
  • N.Y.S. Geisler et al.

    Discharge Patterns of Single Fibers in the Caĉs Auditory- Nerve Fibers

    (1965)
  • R.M. Gray

    Vector Quantization

    IEEE ASSP Magazine

    (1984)
  • J-P. Haton

    Present Issues in Continuous Speech Recognition and Understanding

  • J-P. Haton

    Knowledge-Based and Expert Systems in Automatic Speech Recognition

  • J-P. Haton et al.

    Syntactic-semantic interpretation of sentences in the MYRTILLE-II speech understanding system

    Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Denver

    (1980)
  • R-M.S. Heffner

    General Phonetics

    (1950)
  • G.E. Hinton et al.

    Learning and relearning in Boltzmann machines

  • R. Jakobson et al.

    Preliminaries to Speech Analysis: The Distinctive Features and their Correlates

    (1952)
  • F. Jelinek

    Continuous Speech Recognition by Statistical Methods

    Proc. IEEE

    (1976)
  • F. Jelinek et al.

    Interpolated estimation of Markov source parameters from sparse data

  • F. Jelinek et al.

    Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech

    IEEE Trans. Infor. Theory

    (1975)
  • O. Kimball et al.

    Efficient implementation of continuous speech recognition on a large scale processor

    Proc. Int. Conf. Acoustics, Speech and Signal Processing, Dallas

    (1987)
  • T. Kohonen et al.

    A thousand word recognition system based on learning subspace method and redundant hash addressing

    Proc. Fifth Int. Conf. Pattern Recognition, Miami Beach, Florida

    (1980)
  • G. Kopec

    Formant tracking using Hidden Markov models

    Proc. Int. Conf. Acoustics, Speech and Signal Processing, Tampa, Florida

    (1985)
  • G. Kopec et al.

    Network-based isolated digit recognition using vector quantization

    IEEE Trans. Acoustics, Speech and Signal Processing

    (1985)
  • W.A. Lea

    The Value of Speech Recognition Systems

  • W.A. Lea et al.

    A prosodically guided speech understanding strategy

    IEEE Trans. Acoustics, Speech, and Signal Processing

    (1975)
  • Cited by (0)

    View full text