Contributed article
Analysis of the correlation structure for a neural predictive model with application to speech recognition

https://doi.org/10.1016/0893-6080(94)90027-2Get rights and content

Abstract

A speech recognizer is developed using a layered feedforward neural network to implement speech-frame prediction. A Markov chain is used to control changes in the network's weight parameters. We postulate that speech recognition accuracy is closely linked to the capability of the predictive model in representing long-term temporal correlations in speech data. Analytical expressions are obtained for the correlation functions for various types of predictive models (linear, compressively nonlinear, and jointly linear and compressively nonlinear) to determine the faithfulness of the models to the actual speech data. Analytical results, computer simulations, and speech recognition experiments suggest that when compressive nonlinear prediction and linear prediction are jointly performed within the same layer of the neural network, the model is better at capturing long-term data correlations and consequently improving speech recognition performance.

References (21)

  • L. Deng

    A generalized hidden Markov model with stateconditioned trend functions of time for the speech signal

    Signal Processing

    (1992)
  • L. Deng et al.

    Large vocabulary word recognition using context-dependent allophonic hidden Markov models

    Computer Speech and Language

    (1990)
  • K. Hornik et al.

    Multilayer feed-forward networks are universal approximators

    Neural Networks

    (1989)
  • L.E. Baum

    An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes

    Inequalities

    (1972)
  • G.E.P. Box et al.

    Time series analysis—forecasting and control

  • G. Cybenko

    Approximation by superpositions of a Sigmoidal function

    Mathematics of Control, Signals, and Systems

    (1989)
  • S.B. Davis et al.

    Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

    IEEE Transactions on Acoustics, Speech, and Signal Processing

    (1980)
  • L. Deng et al.

    Structural design of a hidden Markov model based speech recognizer using multi-valued phonetic features: Comparison with segmental speech units

    Journal of the Acoustical Society of America

    (1992)
  • L. Deng et al.

    Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition

    IEEE Transactions on Signal Processing

    (1991)
  • L. Deng et al.

    Modeling acoustic transitions in speech by state-interpolation hidden Markov models

    IEEE Transactions on Signal Processing

    (1992)
There are more references available in the full text version of this article.

Cited by (25)

  • Deep learning: From speech recognition to language and multimodal processing

    2016, APSIPA Transactions on Signal and Information Processing
  • Survey of Deep Learning Paradigms for Speech Processing

    2022, Wireless Personal Communications
View all citing articles on Scopus

An earlier shortened version of this paper was presented at the first IEEE Workshop on Neural Networks for Signal Processing, September, 1991, Princeton, NJ.

View full text