Model-Based Approaches to Handling Uncertainty

Gales, M. J. F.

doi:10.1007/978-3-642-21317-5_5

M. J. F. Gales³

888 Accesses

Abstract

A powerful approach for handling uncertainty in observations is to modify the statistical model of the data to appropriately reflect this uncertainty. For the task of noise-robust speech recognition, this requires modifying an underlying “clean” acoustic model to be representative of speech in a particular target acoustic environment. This chapter describes the underlying concepts of model-based noise compensation for robust speech recognition and how it can be applied to standard systems. The chapter will then consider important practical issues. These include i) acoustic environment noise parameter estimation; ii) efficient acoustic model compensation and likelihood calculation; and iii) adaptive training to handle multi-style training data. The chapter will conclude by discussing the limitations of the current approaches and research options to address them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Article 06 January 2017

References

A. Acero. Acoustical and Environmental Robustness in Automatic Speech Recognition. Ph.D. thesis, Carnegie Mellon University, 1990.
Google Scholar
A. Acero, L. Deng, T. T. Kristjansson, and J. Zhang. HMM adaptation using vector Taylor series for noisy speech recognition. In Proc. ICSLP, pages 869–872, Beijing, China, October 2000.
Google Scholar
M. Afify, X. Cui, and Y. Gao. Stereo-based stochastic mapping for robust speech recognition. In Proc. ICASSP, 2007.
Google Scholar
T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul. A compact model for speaker-adaptive training. In Proc. ICSLP, 1996.
Google Scholar
J. A. Arrowood and M. A. Clements. Using observation uncertainty in HMM decoding. In Proc. ICSLP, Denver, Colorado, September 2002.
Google Scholar
S. F. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions Audio Speech and Signal Processing, 27:113–120, 1979.
Article Google Scholar
W. Chou. Maximum a posterior linear regression with elliptically symmetric matrix variate priors. In Proc. Eurospeech, 1999.
Google Scholar
A. de la Torre, D. Fohr, and J.-P. Haton. Statistical adaptation of acoustic models to noise conditions for robust speech recognition. In Proc. ICSLP, pages 1437–1440, 2002.
Google Scholar
L. Deng, A. Acero, M. Plumpe, and X. D. Huang. Large vocabulary speech recognition under adverse acoustic environments. In Proc. ICSLP, pages 806–809, Beijing, China, October 2000.
Google Scholar
L. Deng, J. Droppo, and A. Acero. Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing, 12:133–143, 2004.
Article Google Scholar
V. V. Digalakis, D. Rtischev, and L. G. Neumeyer. Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Transactions Speech and Audio Processing, 3:357–366, 1995.
Article Google Scholar
J. Droppo, A. Acero, and L. Deng. Uncertainty decoding with SPLICE for noise robust speech recognition. In Proc. ICASSP, Orlando, Florida, May 2002.
Google Scholar
F. Flego and M. J. F. Gales. Discriminative adaptive training with VTS and JUD. In Proc. ASRU, 2009.
Google Scholar
F. Flego and M. J. F. Gales. Incremental predictive and adaptive noise compensation. In Proc. ICASSP, Taipei, Taiwan, 2009.
Google Scholar
F. Flego and M. J. F. Gales. Adaptive Training and Noise Estimation for Model-Based Noise Compensation for ASR. Technical Report CUED/F-INFENG/TR653, University of Cambridge, 2010.
Google Scholar
B. Frey, L. Deng, A. Acero, and T. T. Kristjansson. ALGONQUIN: Iterating Laplace’s method to remove multiple types of acoustic distortion for robust speech recognition. In Proc. Eurospeech, Aalbork, Denmark, September 2001.
Google Scholar
M. J. F. Gales. Model-Based Techniques for Noise Robust Speech Recognition. Ph.D. thesis, Cambridge University, 1995.
Google Scholar
M. J. F. Gales. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language, 12, January 1998.
Google Scholar
M. J. F. Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Transactions on Speech and Audio Processing, 7:272–281, 1999.
Article Google Scholar
M. J. F. Gales. Cluster adaptive training of hidden Markov models. IEEE Transactions Speech and Audio Processing, 8:417–428, 2000.
Article Google Scholar
M. J. F. Gales and F. Flego. Discriminative classifiers with adaptive kernels for noise robust speech recognition. Computer Speech and Language, 2010.
Google Scholar
M. J. F. Gales and R. C. van Dalen. Predictive linear transforms for noise robust speech recognition. In Proc. ASRU, pages 59–64, 2007.
Google Scholar
M. J. F. Gales and P. C. Woodland. Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10:249–264, 1996.
Article Google Scholar
M. J. F. Gales and S. J. Young. The application of hidden Markov models in speech recognition. Foundation and Trends in Signal Processing, 1(3):195–304, 2008.
Article Google Scholar
R. A. Gopinath, M. J. F. Gales, P. S. Gopalakrishnan, S. Balakrishnan-Aiyer, and M. A. Picheny. Robust speech recognition in noise — performance of the IBM continuous speech recognizer on the ARPA noise spoke task. In Proc. ARPA Workshop on Spoken Language System Technology, pages 127–130, Austin, Texas, 1995.
Google Scholar
R. A. Gopinath, B. Ramabhadran, and S. Dharanipragada. Factor analysis invariant to linear transformations of data. In Proc. ICSLP, pages 397–400, 1998.
Google Scholar
H.-G. Hirsch and D. Pearce. The AURORA experimental framework for the evaluation of speech recognition systems under noisy conditions. In Proc. ASR, pages 181–188, September 2000.
Google Scholar
Y. Hu and Q. Huo. Chinese Spoken Language Processing, chapter in An HMM Compensation Approach Using Unscented Transformation for Noisy Speech Recognition. Springer Berlin/Heidelberg, 2006.
Google Scholar
X. D. Huang, A. Acero, and H. W. Hon. Spoken Language Processing. Prentice Hall, 2001.
Google Scholar
Q. Huo and Y. Hu. Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions. In Proc. Interspeech, pages 1042–1045, Antwerp, Belgium, 2007.
Google Scholar
S. J. Julier and J. K. Uhlmann. Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3):401–422, 2004.
Article Google Scholar
O. Kalinli, M.L. Seltzer, and A. Acero. Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition. In Proc. ICASSP, pages 3825–3828, Taipei, Taiwan, April 2009.
Google Scholar
D. Kim and M. J. F. Gales. Adaptive training with noisy constrained maximum likelihood linear regression for noise robust speech recognition. In Proc. Interspeech, Brighton, UK, 2009.
Google Scholar
D. Kim and M. J. F. Gales. Noisy constrained maximum likelihood linear regression for noise robust speech recognition. IEEE Transactions Audio Speech and Language Processing, 2010.
Google Scholar
D. Y. Kim, C. K. Un, and N. S. Kim. Speech recognition in noisy environments using first-order vector Taylor series. Speech Communication, 24(1):39–49, June 1998.
Article Google Scholar
T. T. Kristjansson. Speech Recognition in Adverse Environments: A Probabilistic Approach. Ph.D. thesis, Waterloo University, Waterloo, Canada, 2002.
Google Scholar
L. Lee and R. C. Rose. Speaker normalisation using efficient frequency warping procedures. In ICASSP’96, Atlanta, 1996.
Google Scholar
C. Leggetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Computer Speech and Language, 9, 1995.
Google Scholar
V. Leutnant and R. Haeb-Umbach. An analytic derivation of a phase-sensitive observation model for noise robust speech recognition. In Proc. Interspeech, pages 2395–2398, 2009.
Google Scholar
J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero. High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series. In Proc. ASRU, pages 65–70, Kyoto, Japan, December 2007.
Google Scholar
J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero. HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In Proc. ICASSP, pages 4069–4072, April 2008.
Google Scholar
H. Liao. Uncertainty Decoding for Noise Robust Speech Recognition. Ph.D. thesis, Cambridge University, Cambridge, UK, sep 2007.
Google Scholar
H. Liao and M. J. F. Gales. Joint uncertainty decoding for noise robust speech recognition. In Proc. Interspeech, 2005.
Google Scholar
H. Liao and M. J. F. Gales. Joint uncertainty decoding for robust large vocabulary speech recognition. Technical Report CUED/F-INFENG/TR552, University of Cambridge, 2006. Available from mi.eng.cam.ac.uk/ ∼ mjfg.
Google Scholar
H. Liao and M. J. F. Gales. Adaptive training with joint uncertainty decoding for robust recognition of noisy data. In Proc. ICASSP, volume 4, pages 389–392, Honolulu, USA, April 2007.
Google Scholar
H. Liao and M. J. F. Gales. Issues with uncertainty decoding for noise robust speech recognition. Speech Communication, 2008.
Google Scholar
Y. Minami and S. Furui. A maximum likelihood procedure for a universal adaptation method based on HMM composition. In Proc. ICASSP, pages 129–132, 1995.
Google Scholar
P. Moreno. Speech Recognition in Noisy Environments. Ph.D. thesis, Carnegie Mellon University, 1996.
Google Scholar
L. Neumeyer and M. Weintraub. Probabilistic optimum filtering for robust speech recognition. In Proc. ICASSP, volume 1, pages 417–420, 1994.
Google Scholar
D. Povey. Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University, 2003.
Google Scholar
D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig. fMPE: Discriminatively trained features for speech recognition. In Proc. ICASSP, Philadelphia, 2005.
Google Scholar
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, February 1989.
Article Google Scholar
B. Raj and R. Stern. Missing feature approaches in speech recognition. IEEE Signal Processing Magazine, 22(5):101–116, 2005.
Article Google Scholar
C. K. Raut, T. Nishimoto, and S. Sagayama. Maximum likelihood based HMM state filtering approach to model adaptation for long reverberation. In Proc. ASRU, 2005.
Google Scholar
D. Rubin and D. Thayer. EM algorithms for ML factor analysis. Psychometrika, 47(1):69–76, March 1982.
Article MATH MathSciNet Google Scholar
S. Sagayama, Y. Yamaguchi, S. Takahashi, and J. Takahashi. Jacobian approach to fast acoustic model adaptation. In Proc. ICASSP, 1997.
Google Scholar
A. Sankar and C.-H. Lee. A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4:190–202, May 1996.
Article Google Scholar
M. Seltzer, K. Kalgaonkar, and A. Acero. Acoustic model adaptation via linear spline interpolation for robust speech recognition. In Proc. ICASSP, 2010.
Google Scholar
M. Seltzer, B. Raj, and R. Stern. A Bayesian framework for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 43(4):379–393, 2004.
Article Google Scholar
Y. Shinohara and M. Akamine. Bayesian feature enhancement using a mixture of unscented transformations for uncertainty decoding of noisy speech. In Proc. ICASSP, pages 4569–4572, 2009.
Google Scholar
V. Stouten, H. van Hamme, and P. Wambacq. Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In Proc. ICSLP, volume I, pages 105–108, Jeju Island, Korea, October 2004.
Google Scholar
V. Stouten, H. van Hamme, and P. Wambacq. Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement. In Proc. ICASSP, volume I, pages 433–436, Philadelphia, USA, March 2005.
Google Scholar
R. C. van Dalen, F. Flego, and M. J. F. Gales. Transforming features to compensate speech recogniser models for noise. In Proc. Interspeech, 2009.
Google Scholar
R. C. van Dalen and M. J. F. Gales. Extended VTS for noise-robust speech recognition. In Proc. ICASSP, Taipei, Taiwan, 2009.
Google Scholar
R. C. van Dalen and M. J. F. Gales. Asymptotically exact noise-corrupted speech likelihoods. In Proc. Interspeech, 2010.
Google Scholar
A. P. Varga, R. K. Moore, J. Bridle, K. Ponting, and M. Russel. Noise compensation algorithms for use with hidden Markov model based speech recognition. In Proc. ICASSP, 1988.
Google Scholar
H. Xu, M. J. F. Gales, and K. K. Chin. Improving joint uncertainty decoding performance by predictive methods for noise robust speech recognition. In Proc. ASRU, 2009.
Google Scholar

Download references

Author information

Authors and Affiliations

Cambridge University Engineering Department, Trumpington Street, Cambridge, UK
M. J. F. Gales

Authors

M. J. F. Gales
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. J. F. Gales .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gales, M.J.F. (2011). Model-Based Approaches to Handling Uncertainty. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_5
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics