Skip to main content
Log in

Continuous speech recognition using linear dynamic models

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Digalakis, V., Rohlicek, J., & Ostendorf, M. (1993). ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 431–442.

    Article  Google Scholar 

  • Frankel, J. (2003). Linear dynamic models for automatic speech recognition. Retrieved from http://homepages.inf.ed.ac.uk/joe/pubs/2003/Frankel_thesis2003.pdf.

  • Frankel, J., & King, S. (2007). Speech recognition using linear dynamic models. IEEE Transactions on Speech and Audio Processing, 15(1), 246–256.

    Google Scholar 

  • Ganapathiraju, A., Hamaker, J., & Picone, J. (2004). Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing, 52(8), 2348–2355.

    Article  Google Scholar 

  • Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallet, D., Dahlgren, N., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. The linguistic data consortium catalog. Philadelphia: The Linguistic Data Consortium. ISBN:1-58563-019-5.

    Google Scholar 

  • Liang, F. (2003). An effective Bayesian neural network classifier with a comparison study to support vector machine. Neural Computation, 15(8), 1959–1989.

    Article  MATH  Google Scholar 

  • Ma, T. (2010). Linear dynamic model for continuous speech recognition. Starkville: Mississippi State University.

    Google Scholar 

  • Parihar, N., Picone, J., Pearce, D., & Hirsch, H.-G. (2004). Performance analysis of the Aurora large vocabulary baseline system. In Proceedings of the European signal processing conference, Vienna, Austria (pp. 553–556).

    Google Scholar 

  • Tsontzos, G., Diakoloukas, V., Koniaris, C., & Digalakis, V. (2007). Estimation of general identifiable linear dynamic models with an application in speech recognition. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (Vol. 4, pp. IV-453–IV-456).

    Google Scholar 

  • Wöllmer, M., Klebert, N., & Schuller, B. (2011). Switching linear dynamic models for recognition of emotionally colored and noisy speech. Sprachkommunikation 2010. ITG-FB (Vol. 225, pp. 1–4). Bochum: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Picone.

Additional information

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0414450. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, T., Srinivasan, S., Lazarou, G. et al. Continuous speech recognition using linear dynamic models. Int J Speech Technol 17, 11–16 (2014). https://doi.org/10.1007/s10772-013-9200-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9200-x

Keywords

Navigation