Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing

Delcroix, Marc; Watanabe, Shinji; Nakatani, Tomohiro

doi:10.1007/978-3-642-21317-5_9

Marc Delcroix³,
Shinji Watanabe³ &
Tomohiro Nakatani³

892 Accesses
1 Citations

Abstract

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this chapter, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortions, causing a static and dynamic mismatch between speech features and the acoustic model used for recognition. Acoustic model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This chapter introduces a novel model adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric representation of Gaussian variances of the acoustic model that includes static and dynamic components. Adaptive training is used to optimize the variances in order to realize an appropriate interconnection between dereverberation and a speech recognizer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments

Article Open access 30 June 2015

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Article 06 January 2017

References

Arrowood, J. and Clements, M.: Using observation uncertainty in HMM decoding. In: Proceedings of International Conferences on Spoken Language Processing (ICSLP’02), 3, 1562–1564 (2002)
Google Scholar
Astudillo, R. F., Kolossa, D. and Orglmeister, R.: Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. In: Proceedings of 10th European Conference on Speech Communication and Technology (Interspeech’09), 2491–2494 (2009)
Google Scholar
Boll, S. F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120 (1979)
Article Google Scholar
Cooke, M. P., Green, P. D., Josifovski, L. B. and Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Communication, 34, 267–285 (2001)
Article MATH Google Scholar
Couvreur, L. and Couvreur, C.: Blind model selection for automatic speech recognition in reverberant environments. Journal of VLSI Signal Processing Systems, 36(2–3), 189–203 (2004)
Google Scholar
Delcroix, M., Nakatani, T. and Watanabe, S.: Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor. In: IEICE Technical Report, SP-105, 55–60 (2007)
Google Scholar
Delcroix, M., Nakatani, T. and Watanabe, S.: Combined static and dynamic variance adaptation for efficient interconnection of a speech enhancement pre-processor with speech recognizer. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’08), 4073–4076 (2008)
Google Scholar
Delcroix, M., Nakatani, T. and Watanabe, S.: Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 324–334 (2009)
Article Google Scholar
Deng, L., Droppo, J. and Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing, 13(3), 412–421 (2005)
Article Google Scholar
Droppo, J., Acero, A. and Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), 1, 57–60 (2002)
Google Scholar
Gales, M. J. F. and Woodland, P. C.: Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10, 249–264 (1996)
Article Google Scholar
Gillespie, B. W. and Atlas, L. E.: Acoustic diversity for improved speech recognition in reverberant environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), 1, 557–600 (2002)
Google Scholar
Gong, Y.: Speech recognition in noisy environments: A survey. Speech Communication, 16, 261–291 (1995)
Article Google Scholar
Hikichi, T., Delcroix, M. and Miyoshi, M.: Speech dereverberation algorithm using transfer function estimates with overestimated order. Acoustical Science and Technology, 27(1), 28–35 (2006)
Article Google Scholar
Hirsch, H. G. and Pearce, D.: The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy condition. In: Proceedings of The ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millenium (ITRW ASR2000), 18–20 (2000)
Google Scholar
Hirsch, H. G. and Finster, H.: A new approach for the adaptation of HMMs to reverberation and background noise. Speech Communication, 50, 244–263 (2008)
Article Google Scholar
Hori, T., Hori, C., Minami, Y. and Nakamura, A.: Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 15 (4), 1352–1365 (2007)
Article Google Scholar
Huang, X., Acero, A. and Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, New-Jersey (2001)
Google Scholar
Ion, V. and Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Speech and Audio Processing, 16 (5), 1047–1060 (2008)
Article Google Scholar
Kameoka, H., Nakatani, T. and Yoshioka, T.: Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’09), 45–48 (2009)
Google Scholar
Kinoshita, K., Delcroix, M., Nakatani T. and Miyoshi, M.: Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 17 (4), 534–545 (2009)
Article Google Scholar
Kolossa, D., Sawada, H., Astudillo, R. F., Orglmeister, R. and Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Proceedings of The Asilomar Conference on Signals, Systems, and Computers (ACSSC’06), 1397–1401 (2006)
Google Scholar
Kolossa, D., Araki, S., Delcroix, M., Nakatani, T., Orglmeister, R. and Makino, S.: Missing feature speech recognition in a meeting situation with maximum SNR beamforming. In: Proceedings of The IEEE International Symposium on Circuits and Systems (ISCAS’08), 3218–3221 (2008)
Google Scholar
Kolossa, D., Klimas A. and Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: Proceedings of The IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 82–85 (2005)
Google Scholar
Krueger, A. and Haeb-Umbach, R.: Model based feature enhancement for automatic speech recognition in reverberant environments. In: Proceedings of 10th European Conference on Speech Communication and Technology (Interspeech’09), 1231–1234 (2009)
Google Scholar
Kuttruff, H.: Room Acoustics. 3rd ed. (Elsevier Science, London, 1991)
Google Scholar
Liao, H. and Gales, M. J. F.: Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of 9th European Conference on Speech Communication and Technology (Interspeech’05-Eurospeech), 3129–3132 (2005)
Google Scholar
Liao, H. and Gales, M. J. F.: Adapative training with joint uncertainty decoding for robust recognition of noisy data. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), 4, 389–392 (2007)
Google Scholar
Meng, X.-L. and Rubin, D. B.: Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278 (1993)
Article MATH MathSciNet Google Scholar
Nakamura, S. and Nishiura, T.: RWCP sound scene database in real acoustical environments. http://tosa.mri.co.jp/sounddb/micarray/indexe.htm Cited 31 May 2010
Naylor, P. A. and Gaubitch, N. D.: Speech dereverberation. In: Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC’05), iwaenc05.ele.tue.nl/proceedings/papers/pt03.pdf (2005)
Google Scholar
Paul, D. B. and Baker, J. M. : The design for the Wall Street Journal-based CSR corpus. In: Proceedings of the Workshop on Speech and Natural Language. 357–362 (1992)
Google Scholar
Quatieri, T. F.: Discrete-Time Speech Signal Processing. (Prentice Hall, New Jersey, 2002)
Google Scholar
Raj, B. and Stern, R. M.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine, 22 (5), 101–116 (2005)
Article Google Scholar
Raut, C. K., Nishimoto, T. and Sagayama, S.: Model adaptation by state splitting of HMM for long reverberation. In: Proceedings of 9th European Conference on Speech Communication and Technology (Interspeech’05-Eurospeech), 277–280 (2005)
Google Scholar
Rose, R. C., Hofstetter, E. M. and Reynolds, D. A.: Integrated models of signal and background with application to speaker identification in noise. IEEE Transactions on Speech and Audio Processing, 2(2), 245–257 (1994)
Article Google Scholar
Sankar, A. and Lee C.-H.: Robust speech recognition based on stochastic matching. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95), 1, 121–125 (1995)
Google Scholar
Sankar, A. and Lee, C.-H.: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202 (1996)
Article Google Scholar
Schuller, B., Wollmer, M., Moosmayr, T. and Rigoll, G.: Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009, (2009)
Google Scholar
Sehr, A. and Kellerman, W.: A new concept for feature-domain dereverberation for robust distant-talking ASR. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), 4, 369–372 (2007)
Google Scholar
Sehr, A., Maas, R. and Kellerman, W.: Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, (To appear) (2010)
Google Scholar
Stouten, V., Van hamme, H. and Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Communication, 48, 1502–1514 (2006)
Google Scholar
Stouten, V., Van hamme, H. and Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement In: Proceedings of International Conferences on Spoken Language Processing (ICSLP’04), 105108 (2004)
Google Scholar
Takiguchi, T. and Nishimura, M.: Acoustic model adaptation using first order prediction for reverberant speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), 1, 869–972 (2004)
Google Scholar
Tashev, I. and Allred, D.: Reverberation reduction for improved speech recognition. In: Proceedings of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA’05), (2005)
Google Scholar
Yoshioka, T.: Speech enhancement in reverberant environments. Ph.D. dissertation, Kyoto University (2010)
Google Scholar
Wolfel, M.: Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 312–323 (2009)
Article Google Scholar
Wu, M. and Wang, D.: A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14, 774–784 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communications Science Laboratories, Kyoto, 619-0237, Japan
Marc Delcroix, Shinji Watanabe & Tomohiro Nakatani

Authors

Marc Delcroix
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Watanabe
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro Nakatani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Delcroix .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Delcroix, M., Watanabe, S., Nakatani, T. (2011). Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_9
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics