Continuous speech recognition using linear dynamic models

Ma, Tao; Srinivasan, Sundararajan; Lazarou, Georgios; Picone, Joseph

doi:10.1007/s10772-013-9200-x

Continuous speech recognition using linear dynamic models

Published: 06 June 2013

Volume 17, pages 11–16, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Tao Ma¹,
Sundararajan Srinivasan²,
Georgios Lazarou³ &
…
Joseph Picone⁴

293 Accesses
2 Citations
Explore all metrics

Abstract

Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition

Single microphone speech separation by diffusion-based HMM estimation

Article Open access 18 October 2016

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

References

Digalakis, V., Rohlicek, J., & Ostendorf, M. (1993). ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 431–442.
Article Google Scholar
Frankel, J. (2003). Linear dynamic models for automatic speech recognition. Retrieved from http://homepages.inf.ed.ac.uk/joe/pubs/2003/Frankel_thesis2003.pdf.
Frankel, J., & King, S. (2007). Speech recognition using linear dynamic models. IEEE Transactions on Speech and Audio Processing, 15(1), 246–256.
Google Scholar
Ganapathiraju, A., Hamaker, J., & Picone, J. (2004). Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing, 52(8), 2348–2355.
Article Google Scholar
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallet, D., Dahlgren, N., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. The linguistic data consortium catalog. Philadelphia: The Linguistic Data Consortium. ISBN:1-58563-019-5.
Google Scholar
Liang, F. (2003). An effective Bayesian neural network classifier with a comparison study to support vector machine. Neural Computation, 15(8), 1959–1989.
Article MATH Google Scholar
Ma, T. (2010). Linear dynamic model for continuous speech recognition. Starkville: Mississippi State University.
Google Scholar
Parihar, N., Picone, J., Pearce, D., & Hirsch, H.-G. (2004). Performance analysis of the Aurora large vocabulary baseline system. In Proceedings of the European signal processing conference, Vienna, Austria (pp. 553–556).
Google Scholar
Tsontzos, G., Diakoloukas, V., Koniaris, C., & Digalakis, V. (2007). Estimation of general identifiable linear dynamic models with an application in speech recognition. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (Vol. 4, pp. IV-453–IV-456).
Google Scholar
Wöllmer, M., Klebert, N., & Schuller, B. (2011). Switching linear dynamic models for recognition of emotionally colored and noisy speech. Sprachkommunikation 2010. ITG-FB (Vol. 225, pp. 1–4). Bochum: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Siri at Apple Inc, 2 Infinite Loop, mailstop 302-4APP, Cupertino, CA, 95014, USA
Tao Ma
Nuance Communications Inc., 1198 East Arques Avenue, Sunnyvale, CA, 94085, USA
Sundararajan Srinivasan
The New York City Transit Authority, 30-74 38th Street, Apt 1A, Astoria, New York, NY, 11103, USA
Georgios Lazarou
Department of Electrical and Computer Engineering, Temple University, 1947 North 12th Street, Philadelphia, PA, 19027, USA
Joseph Picone

Authors

Tao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Sundararajan Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Lazarou
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Picone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph Picone.

Additional information

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0414450. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, T., Srinivasan, S., Lazarou, G. et al. Continuous speech recognition using linear dynamic models. Int J Speech Technol 17, 11–16 (2014). https://doi.org/10.1007/s10772-013-9200-x

Download citation

Received: 02 February 2013
Accepted: 22 May 2013
Published: 06 June 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10772-013-9200-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous speech recognition using linear dynamic models

Abstract

Access this article

Similar content being viewed by others

Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition

Single microphone speech separation by diffusion-based HMM estimation

A Bayesian view on acoustic model-based techniques for robust speech recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Continuous speech recognition using linear dynamic models

Abstract

Access this article

Similar content being viewed by others

Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition

Single microphone speech separation by diffusion-based HMM estimation

A Bayesian view on acoustic model-based techniques for robust speech recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation