Skip to main content
Log in

A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

  • Research Papers
  • Special Focus
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Handling variable, non-stationary ambient noise is a challenging task for automatic speech recognition (ASR) systems. To address this issue, multi-style, noise condition independent (CI) model training using speech data collected in diverse noise environments, or uncertainty decoding techniques can be used. An alternative approach is to explicitly approximate the continuous trajectory of Gaussian component mean and variance parameters against the varying noise level, for example, using variable parameter hidden Markov model (VPHMM). This paper investigates a more generalized form of variable parameter HMMs (GVP-HMM). In addition to Gaussian component means and variances, it can also provide a more compact trajectory modeling for tied linear transformations. An alternative noise condition dependent (CD) training algorithm is also proposed to handle the bias to training noise condition distribution. Consistent error rate gains were obtained over conventional VP-HMM mean and variance only trajectory modeling on a media vocabulary Mandarin Chinese in-car navigation command recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lippmann R, Martin E, Paul D. Multi-style training for robust isolated-word speech recognition. In: Proceedings of IEEE ICASSP, Dallas, Texas, USA, 1987. 705–708

  2. Anastasakos T, McDonough J, Schwartz R, et al. A compact model for speaker-adaptive training. In: Proceedings of ICSLP, Philadelphia, PA, USA, 1996. 1137–1140

  3. Gales M J F. Maximum likelihood linear transformations for HMM-based speech recognition. Comput Speech Lang, 1998, 12: 171–185

    Article  Google Scholar 

  4. Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput Speech Lang, 1995, 9: 171–186

    Article  Google Scholar 

  5. Flego F, Gales M J F. Discriminative adaptive training with VTS and JUD. In: Proceedings of ASRU, Merano, Italy, 2009. 170–175

  6. Yu K, Gales M J F. Bayesian adaptive inference and adaptive training. IEEE Trans Audio Speech Lang Process, 2007, 15: 1932–1943

    Article  Google Scholar 

  7. Gales M J F. Adaptive training for robust ASR. In: Proceedings of ASRU, Madonna di Campiglio, Italy, 2001. 15–20

  8. Yu K, Gales M J F. Bayesian adaptation and adaptively trained systems. In: Proceedings of ASRU, Cancun, Mexico, 2005. 209–214

  9. Arrowood J A, Clements M A. Using observation uncertainty in HMM decoding. In: Proceedings of ICSLP, Denver, Colorado, USA, 2002. 1561–1564

  10. Deng L, Droppo J, Acero A. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans Speech Audio, 2005, 13: 412–421

    Article  Google Scholar 

  11. Kristjansson T T, Frey B J. Accounting for uncertainty in observations: A new paradigm for robust speech recognition. In: Proceedings of ICASSP, Orlando, Florida, USA, 2002. 61–64

  12. Droppo J, Acero A, Deng L. Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of ICASSP, Orlando, Florida, USA, 2002. 57–60

  13. Liao H, Gales M J F. Issues with uncertainty decoding for noise robust speech recognition. In: Proceedings of Interspeech, Pittsburgh, PA, USA, 2006

  14. Benitez C, Segura J, de la Tore A, et al. Including uncertainty of speech observation in robust speech recognition. In: Proceedings of ICSLP, Jeju island, Korea, 2004. 137–140

  15. Liao H, Gales M J F. Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of Interspeech, Lisbon, Portugal, 2005

  16. Liao H, Gales M J F. Adaptive training with joint uncertainty decoding for robust recognition of noisy data. In: Proceedings of ICASSP, Honolulu, Hawaii, USA, 2007. 389–392

  17. Arrowood J A, Clements M A. Using observation uncertainty in HMM decoding. In: Proceedings of ICSLP, Denver, Colorado, USA, 2002. 1561–1564

  18. Steouten V, van Hamme H, Wambacq P. Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: Proceedings of ICSLP, Jeju island, Korea, 2004. 105–108

  19. Deng L, Droppo J, Acero A. Exploiting variances in robust feature extraction based on a parametric model of speech distortion. In: Proceedings of ICSLP, Jeju island, Korea, 2002. 806–809

  20. Wolfel M, Faubel F. Considering uncertainty by particle filter enhanced speech feature in large vocabulary continuous speech recognition. In: Proceedings of ICASSP, Honolulu, Hawaii, USA, 2007. 1049–1052

  21. Fujinaga K, Nakai M, Shimodaira H, et al. Multiple-regression hidden Markov model. In: Proceedings of IEEE ICASSP, Salt Lake City, Utah, USA, 2001. 1: 513–516

  22. Cui X, Gong Y. A study of variable-parameter Gaussian mixture hidden Markov modeling for noisy speech recognition. IEEE Trans Audio Speech Lang Process, 2007, 15: 1366–1376

    Article  Google Scholar 

  23. Yu D, Deng L, Gong Y, et al. Discriminative training of variable-parameter HMMs for noise robust speech recognition. In: Proceedings of Interspeech, Brisbane, Australia, 2008. 285–288

  24. Yu D, Deng L, Gong Y, et al. Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition. In: Proceedings of Interspeech, Brisbane, Australia, 2008. 1253–1256

  25. Yu D, Deng L, Gong Y, et al. A novel framework and training algorithm for variable-parameter hidden Markov models. IEEE Trans Audio Speech Lang Process, 2009, 17: 1348–1360

    Article  Google Scholar 

  26. Bjorck A, Pereyra V. Solution of Vandermonde systems of equations. Math Comput (Am Math Soc), 1970, 24: 893–903

    Article  MathSciNet  Google Scholar 

  27. Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc, 1977, 39: 1–39

    MATH  MathSciNet  Google Scholar 

  28. Martin R. An efficient algorithm to estimate the instantaneous SNR speech signals. In: Proceedings of Eurospeech, Berlin, Germany, 1993. 1093–1096

  29. Young S, Evermann G, Gales M, et al. The HTK Book. Version 3.4.1. Cambridge: Cambridge University Engineering Department, 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Cheng.

Additional information

CHENG Ning was born in 1981. He received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, Beijing, China in 2009. Currently, he is a postdoctoral researcher at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. His research interests include robust speech recognition, speech enhancement and microphone array.

WANG Lan is a Professor of Shen-Zhen Institutes of Advanced Technology, Chinese Academy of Sciences. She received her M.S. degree in the Center of Information Science, Peking University. She obtained her Ph.D. degree from the Machine Intelligence Laboratory of Cambridge University Engineering Department in 2006, and then worked as a research associate in CUED. Her research interests are large vocabulary continuous speech recognition, speech visualization and audio information indexing.

LIU XunYing was born in 1978. He received the Ph.D. degree in speech recognition in 2006 and MPhil degree in computer speech and language processing in 2001 both from University of Cambridge, prior to a bachelor’s degree from Shanghai Jiao Tong University in 2000. He is currently a Senior Research Associate at the Machine Intelligence Laboratory of the Cambridge University Engineering Department. He is the lead researcher on the EPSRC funded Natural Speech Technology and the DARPA funded Broad Operational Language Translation Programs at Cambridge. He was the recipient of best paper award at ISCA Interspeech2010. His current research interests include large vocabulary continuous speech recognition, language modelling and adaptation, weighted finite state transducers, factored acoustic modelling, noise robust speech recognition and statistical machine translation. Dr. Liu Xunying is a member of IEEE and ISCA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, N., Liu, X. & Wang, L. A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression. Sci. China Inf. Sci. 54, 2481–2491 (2011). https://doi.org/10.1007/s11432-011-4490-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4490-6

Keywords

Navigation