A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

Cheng, Ning; Liu, XunYing; Wang, Lan

doi:10.1007/s11432-011-4490-6

A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

Research Papers
Special Focus
Published: 03 December 2011

Volume 54, pages 2481–2491, (2011)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Ning Cheng¹,
XunYing Liu^1,2 &
Lan Wang¹

144 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

Handling variable, non-stationary ambient noise is a challenging task for automatic speech recognition (ASR) systems. To address this issue, multi-style, noise condition independent (CI) model training using speech data collected in diverse noise environments, or uncertainty decoding techniques can be used. An alternative approach is to explicitly approximate the continuous trajectory of Gaussian component mean and variance parameters against the varying noise level, for example, using variable parameter hidden Markov model (VPHMM). This paper investigates a more generalized form of variable parameter HMMs (GVP-HMM). In addition to Gaussian component means and variances, it can also provide a more compact trajectory modeling for tied linear transformations. An alternative noise condition dependent (CD) training algorithm is also proposed to handle the bias to training noise condition distribution. Consistent error rate gains were obtained over conventional VP-HMM mean and variance only trajectory modeling on a media vocabulary Mandarin Chinese in-car navigation command recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis

Simplified scoring methods for HMM-based speech recognition

Article 09 August 2015

References

Lippmann R, Martin E, Paul D. Multi-style training for robust isolated-word speech recognition. In: Proceedings of IEEE ICASSP, Dallas, Texas, USA, 1987. 705–708
Anastasakos T, McDonough J, Schwartz R, et al. A compact model for speaker-adaptive training. In: Proceedings of ICSLP, Philadelphia, PA, USA, 1996. 1137–1140
Gales M J F. Maximum likelihood linear transformations for HMM-based speech recognition. Comput Speech Lang, 1998, 12: 171–185
Article Google Scholar
Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput Speech Lang, 1995, 9: 171–186
Article Google Scholar
Flego F, Gales M J F. Discriminative adaptive training with VTS and JUD. In: Proceedings of ASRU, Merano, Italy, 2009. 170–175
Yu K, Gales M J F. Bayesian adaptive inference and adaptive training. IEEE Trans Audio Speech Lang Process, 2007, 15: 1932–1943
Article Google Scholar
Gales M J F. Adaptive training for robust ASR. In: Proceedings of ASRU, Madonna di Campiglio, Italy, 2001. 15–20
Yu K, Gales M J F. Bayesian adaptation and adaptively trained systems. In: Proceedings of ASRU, Cancun, Mexico, 2005. 209–214
Arrowood J A, Clements M A. Using observation uncertainty in HMM decoding. In: Proceedings of ICSLP, Denver, Colorado, USA, 2002. 1561–1564
Deng L, Droppo J, Acero A. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans Speech Audio, 2005, 13: 412–421
Article Google Scholar
Kristjansson T T, Frey B J. Accounting for uncertainty in observations: A new paradigm for robust speech recognition. In: Proceedings of ICASSP, Orlando, Florida, USA, 2002. 61–64
Droppo J, Acero A, Deng L. Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of ICASSP, Orlando, Florida, USA, 2002. 57–60
Liao H, Gales M J F. Issues with uncertainty decoding for noise robust speech recognition. In: Proceedings of Interspeech, Pittsburgh, PA, USA, 2006
Benitez C, Segura J, de la Tore A, et al. Including uncertainty of speech observation in robust speech recognition. In: Proceedings of ICSLP, Jeju island, Korea, 2004. 137–140
Liao H, Gales M J F. Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of Interspeech, Lisbon, Portugal, 2005
Liao H, Gales M J F. Adaptive training with joint uncertainty decoding for robust recognition of noisy data. In: Proceedings of ICASSP, Honolulu, Hawaii, USA, 2007. 389–392
Arrowood J A, Clements M A. Using observation uncertainty in HMM decoding. In: Proceedings of ICSLP, Denver, Colorado, USA, 2002. 1561–1564
Steouten V, van Hamme H, Wambacq P. Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: Proceedings of ICSLP, Jeju island, Korea, 2004. 105–108
Deng L, Droppo J, Acero A. Exploiting variances in robust feature extraction based on a parametric model of speech distortion. In: Proceedings of ICSLP, Jeju island, Korea, 2002. 806–809
Wolfel M, Faubel F. Considering uncertainty by particle filter enhanced speech feature in large vocabulary continuous speech recognition. In: Proceedings of ICASSP, Honolulu, Hawaii, USA, 2007. 1049–1052
Fujinaga K, Nakai M, Shimodaira H, et al. Multiple-regression hidden Markov model. In: Proceedings of IEEE ICASSP, Salt Lake City, Utah, USA, 2001. 1: 513–516
Cui X, Gong Y. A study of variable-parameter Gaussian mixture hidden Markov modeling for noisy speech recognition. IEEE Trans Audio Speech Lang Process, 2007, 15: 1366–1376
Article Google Scholar
Yu D, Deng L, Gong Y, et al. Discriminative training of variable-parameter HMMs for noise robust speech recognition. In: Proceedings of Interspeech, Brisbane, Australia, 2008. 285–288
Yu D, Deng L, Gong Y, et al. Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition. In: Proceedings of Interspeech, Brisbane, Australia, 2008. 1253–1256
Yu D, Deng L, Gong Y, et al. A novel framework and training algorithm for variable-parameter hidden Markov models. IEEE Trans Audio Speech Lang Process, 2009, 17: 1348–1360
Article Google Scholar
Bjorck A, Pereyra V. Solution of Vandermonde systems of equations. Math Comput (Am Math Soc), 1970, 24: 893–903
Article MathSciNet Google Scholar
Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc, 1977, 39: 1–39
MATH MathSciNet Google Scholar
Martin R. An efficient algorithm to estimate the instantaneous SNR speech signals. In: Proceedings of Eurospeech, Berlin, Germany, 1993. 1093–1096
Young S, Evermann G, Gales M, et al. The HTK Book. Version 3.4.1. Cambridge: Cambridge University Engineering Department, 2009
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences/The Chinese University of Hong Kong, Hong Kong, China
Ning Cheng, XunYing Liu & Lan Wang
Cambridge University Engineering Department, Cambridge, CB2 1PZ, UK
XunYing Liu

Authors

Ning Cheng
View author publications
You can also search for this author in PubMed Google Scholar
XunYing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Cheng.

Additional information

CHENG Ning was born in 1981. He received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, Beijing, China in 2009. Currently, he is a postdoctoral researcher at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. His research interests include robust speech recognition, speech enhancement and microphone array.

WANG Lan is a Professor of Shen-Zhen Institutes of Advanced Technology, Chinese Academy of Sciences. She received her M.S. degree in the Center of Information Science, Peking University. She obtained her Ph.D. degree from the Machine Intelligence Laboratory of Cambridge University Engineering Department in 2006, and then worked as a research associate in CUED. Her research interests are large vocabulary continuous speech recognition, speech visualization and audio information indexing.

LIU XunYing was born in 1978. He received the Ph.D. degree in speech recognition in 2006 and MPhil degree in computer speech and language processing in 2001 both from University of Cambridge, prior to a bachelor’s degree from Shanghai Jiao Tong University in 2000. He is currently a Senior Research Associate at the Machine Intelligence Laboratory of the Cambridge University Engineering Department. He is the lead researcher on the EPSRC funded Natural Speech Technology and the DARPA funded Broad Operational Language Translation Programs at Cambridge. He was the recipient of best paper award at ISCA Interspeech2010. His current research interests include large vocabulary continuous speech recognition, language modelling and adaptation, weighted finite state transducers, factored acoustic modelling, noise robust speech recognition and statistical machine translation. Dr. Liu Xunying is a member of IEEE and ISCA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, N., Liu, X. & Wang, L. A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression. Sci. China Inf. Sci. 54, 2481–2491 (2011). https://doi.org/10.1007/s11432-011-4490-6

Download citation

Received: 15 June 2011
Accepted: 26 September 2011
Published: 03 December 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s11432-011-4490-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

Abstract

Access this article

Similar content being viewed by others

A Bayesian view on acoustic model-based techniques for robust speech recognition

Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis

Simplified scoring methods for HMM-based speech recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

Abstract

Access this article

Similar content being viewed by others

A Bayesian view on acoustic model-based techniques for robust speech recognition

Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis

Simplified scoring methods for HMM-based speech recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation