Perceptual Models for Automatic Speech Recognition Systems

doi:10.1016/S0065-2458(08)60153-9

Advances in Computers

Volume 31, 1990, Pages 99-173

https://doi.org/10.1016/S0065-2458(08)60153-9 Get rights and content

Publisher Summary

The research on automatic speech recognition aims to give the machine capabilities similar to humans to communicate in natural spoken languages, and such research is of great interest from both the application and the research point of view. This chapter discusses the fundamentals of speech production and speech knowledge, numerous techniques used in speech recognition systems, some successful speech recognition systems, and some recent advances in speech recognition research, such as the application of artificial neural network models and a special case of Hidden Markov models. The problem of speech recognition is approached in two ways: using models based on speech production, and using models based on speech perception. The chapter illustrates a combination of an ear model and multi-layer networks that makes possible an effective generalization among speakers in coding vowels. In addition, it also suggests that the use of speech knowledge organized as morphological properties is robust enough to handle inter- and intra-speaker variations. By learning the ways to allocate the degrees of evidence to articulatory features, it is possible to estimate normalized values for the place and manner of articulation, which appear to be highly consistent with qualitative expectations based on speech knowledge. The effective learning and good generalizations can be obtained using a limited number of speakers, in analogy with what humans do. Speech coders that create degrees of evidence of phonetic features can be used for fast lexical access, to recognize phonemes in new languages with limited training, to constrain the search for the interpretation of a sentence.

References (86)

C. Scagliola
Continuous speech recognition without segmentation: Two ways of using diphones as basic speech units
Speech Commun
(1983)
S. Seneff
A joint synchrony/mean-rate model of auditory speech processing
J. Phonetics
(1988)
B. Aldefeld et al.
A minimum distance search technique and its application to automatic directory assistance
Bell System Tech. J.
(1980)
A. Averbuch
Experiments with the Tangora 20,000-word Speech Recognizer.
Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Dallas
(1987)
L.R. Bahl et al.
Automatic recognition of continuously spoken sentences from a finite state grammar
Proc. IEEE Int. Conf. Acoustics; Speech, and Signal Processing, Washington, D.C.
(1979)
L.R. Bahl et al.
A maximum likelihood approach to continuous speech recognition
IEEE Trans. Pattern Anal. Machine Intell
(1983)
J.K. Baker
The DRAGON system—An overview
IEEE Trans. Acoustics, Speech, and Signal Processing
(1975)
J.K. Baker
Stochastic modeling for automatic speech understanding
L.E. Baum
An inequality and associated maximization technique in the statistical estimation for probabilistic functions of Markov processes
Inequalities
(1972)
Y. Bengio et al.
Use of Neural Networks for the Recognition of the Place of Articulation
Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, New York
(1988)

H. Bourlard et al.

Multilayer perception and automatic speech recognition

IEEE First Int. Conf. Neural Networks, San Diego

(1987)

H. Bourlard et al.

Connected digit recognition using vector quantization

Proc. Int. Conf. Acoustics, Speech, and Signal Processing, San Diego

(1984)

Chen, F. R. (1980). Acoustic-Phonetic Constraints in Continuous Speech Recognition: A Case Study Using the Digit...

P.S. Cohen et al.

(1987)

L.D. Erman et al.

The HEARSAY-I speech understanding system: An example of the recognition process

IEEE Trans. Comput

(1976)

G. Fant

N.Y.S. Geisler et al.

Discharge Patterns of Single Fibers in the Caĉs Auditory- Nerve Fibers

(1965)

R.M. Gray

Vector Quantization

IEEE ASSP Magazine

(1984)

J-P. Haton

Present Issues in Continuous Speech Recognition and Understanding

J-P. Haton

Knowledge-Based and Expert Systems in Automatic Speech Recognition

J-P. Haton et al.

Syntactic-semantic interpretation of sentences in the MYRTILLE-II speech understanding system

Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Denver

(1980)

R-M.S. Heffner

General Phonetics

(1950)

G.E. Hinton et al.

Learning and relearning in Boltzmann machines

R. Jakobson et al.

Preliminaries to Speech Analysis: The Distinctive Features and their Correlates

(1952)

F. Jelinek

Continuous Speech Recognition by Statistical Methods

Proc. IEEE

(1976)

F. Jelinek et al.

Interpolated estimation of Markov source parameters from sparse data

F. Jelinek et al.

Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech

IEEE Trans. Infor. Theory

(1975)

O. Kimball et al.

Efficient implementation of continuous speech recognition on a large scale processor

Proc. Int. Conf. Acoustics, Speech and Signal Processing, Dallas

(1987)

T. Kohonen et al.

A thousand word recognition system based on learning subspace method and redundant hash addressing

Proc. Fifth Int. Conf. Pattern Recognition, Miami Beach, Florida

(1980)

G. Kopec

Formant tracking using Hidden Markov models

Proc. Int. Conf. Acoustics, Speech and Signal Processing, Tampa, Florida

(1985)

G. Kopec et al.

Network-based isolated digit recognition using vector quantization

IEEE Trans. Acoustics, Speech and Signal Processing

(1985)

W.A. Lea

The Value of Speech Recognition Systems

W.A. Lea et al.

A prosodically guided speech understanding strategy

IEEE Trans. Acoustics, Speech, and Signal Processing

(1975)

Cited by (0)

View full text

Perceptual Models for Automatic Speech Recognition Systems

Publisher Summary

Speech Commun

J. Phonetics

A minimum distance search technique and its application to automatic directory assistance

Bell System Tech. J.

Experiments with the Tangora 20,000-word Speech Recognizer.

Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Dallas

Automatic recognition of continuously spoken sentences from a finite state grammar

Proc. IEEE Int. Conf. Acoustics; Speech, and Signal Processing, Washington, D.C.

A maximum likelihood approach to continuous speech recognition

IEEE Trans. Pattern Anal. Machine Intell

The DRAGON system—An overview

IEEE Trans. Acoustics, Speech, and Signal Processing

Stochastic modeling for automatic speech understanding

An inequality and associated maximization technique in the statistical estimation for probabilistic functions of Markov processes

Inequalities

Use of Neural Networks for the Recognition of the Place of Articulation

Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, New York

Multilayer perception and automatic speech recognition

IEEE First Int. Conf. Neural Networks, San Diego

Connected digit recognition using vector quantization

Proc. Int. Conf. Acoustics, Speech, and Signal Processing, San Diego

The phonological component of an automatic speech recognition system

Representation of speech-like sounds in the discharge patterns of auditorynerve fibers

J. Acoustical Society of America

Speech coding in the auditory nerve: I. Vowel-like sounds

J. Acoustical Society of America

Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds

J. Acoustical Society America

Speech coding in the auditory nerve: III. Voiceless fricative consonants

J. Acoustical Society of America

Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics

J. Acoustical Society America

A descriptive technique for automatic speech recognition

IEEE Trans. Audio Electroacoust

Computer Recognition of Speech

Handbook of Pattern Recognition and Image Processing

Parallel algorithms for syllable recognition in continuous speech

IEEE Trans. Pattern Anal. Machine Intell

Learning and plan refinement in a knowledge-based system for automatic speech recognition

IEEE Trans. Pattern Anal. Machine Intell

Use of procedural knowledge for automatic speech recognition

Proc. Tenth Int. Joint Conf. Artificial Intelligence, Milan

The HEARSAY-I speech understanding system: An example of the recognition process

IEEE Trans. Comput

Discharge Patterns of Single Fibers in the Caĉs Auditory- Nerve Fibers

Vector Quantization

IEEE ASSP Magazine

Present Issues in Continuous Speech Recognition and Understanding

Knowledge-Based and Expert Systems in Automatic Speech Recognition

Syntactic-semantic interpretation of sentences in the MYRTILLE-II speech understanding system

Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Denver

General Phonetics

Learning and relearning in Boltzmann machines

Preliminaries to Speech Analysis: The Distinctive Features and their Correlates

Continuous Speech Recognition by Statistical Methods

Proc. IEEE

Interpolated estimation of Markov source parameters from sparse data

Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech

IEEE Trans. Infor. Theory

Efficient implementation of continuous speech recognition on a large scale processor

Proc. Int. Conf. Acoustics, Speech and Signal Processing, Dallas

A thousand word recognition system based on learning subspace method and redundant hash addressing

Proc. Fifth Int. Conf. Pattern Recognition, Miami Beach, Florida

Formant tracking using Hidden Markov models

Proc. Int. Conf. Acoustics, Speech and Signal Processing, Tampa, Florida

Network-based isolated digit recognition using vector quantization

IEEE Trans. Acoustics, Speech and Signal Processing

The Value of Speech Recognition Systems

A prosodically guided speech understanding strategy

IEEE Trans. Acoustics, Speech, and Signal Processing