Abstract
The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this chapter, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortions, causing a static and dynamic mismatch between speech features and the acoustic model used for recognition. Acoustic model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This chapter introduces a novel model adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric representation of Gaussian variances of the acoustic model that includes static and dynamic components. Adaptive training is used to optimize the variances in order to realize an appropriate interconnection between dereverberation and a speech recognizer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arrowood, J. and Clements, M.: Using observation uncertainty in HMM decoding. In: Proceedings of International Conferences on Spoken Language Processing (ICSLP’02), 3, 1562–1564 (2002)
Astudillo, R. F., Kolossa, D. and Orglmeister, R.: Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. In: Proceedings of 10th European Conference on Speech Communication and Technology (Interspeech’09), 2491–2494 (2009)
Boll, S. F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120 (1979)
Cooke, M. P., Green, P. D., Josifovski, L. B. and Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Communication, 34, 267–285 (2001)
Couvreur, L. and Couvreur, C.: Blind model selection for automatic speech recognition in reverberant environments. Journal of VLSI Signal Processing Systems, 36(2–3), 189–203 (2004)
Delcroix, M., Nakatani, T. and Watanabe, S.: Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor. In: IEICE Technical Report, SP-105, 55–60 (2007)
Delcroix, M., Nakatani, T. and Watanabe, S.: Combined static and dynamic variance adaptation for efficient interconnection of a speech enhancement pre-processor with speech recognizer. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’08), 4073–4076 (2008)
Delcroix, M., Nakatani, T. and Watanabe, S.: Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 324–334 (2009)
Deng, L., Droppo, J. and Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing, 13(3), 412–421 (2005)
Droppo, J., Acero, A. and Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), 1, 57–60 (2002)
Gales, M. J. F. and Woodland, P. C.: Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10, 249–264 (1996)
Gillespie, B. W. and Atlas, L. E.: Acoustic diversity for improved speech recognition in reverberant environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), 1, 557–600 (2002)
Gong, Y.: Speech recognition in noisy environments: A survey. Speech Communication, 16, 261–291 (1995)
Hikichi, T., Delcroix, M. and Miyoshi, M.: Speech dereverberation algorithm using transfer function estimates with overestimated order. Acoustical Science and Technology, 27(1), 28–35 (2006)
Hirsch, H. G. and Pearce, D.: The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy condition. In: Proceedings of The ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millenium (ITRW ASR2000), 18–20 (2000)
Hirsch, H. G. and Finster, H.: A new approach for the adaptation of HMMs to reverberation and background noise. Speech Communication, 50, 244–263 (2008)
Hori, T., Hori, C., Minami, Y. and Nakamura, A.: Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 15 (4), 1352–1365 (2007)
Huang, X., Acero, A. and Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, New-Jersey (2001)
Ion, V. and Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Speech and Audio Processing, 16 (5), 1047–1060 (2008)
Kameoka, H., Nakatani, T. and Yoshioka, T.: Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’09), 45–48 (2009)
Kinoshita, K., Delcroix, M., Nakatani T. and Miyoshi, M.: Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 17 (4), 534–545 (2009)
Kolossa, D., Sawada, H., Astudillo, R. F., Orglmeister, R. and Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Proceedings of The Asilomar Conference on Signals, Systems, and Computers (ACSSC’06), 1397–1401 (2006)
Kolossa, D., Araki, S., Delcroix, M., Nakatani, T., Orglmeister, R. and Makino, S.: Missing feature speech recognition in a meeting situation with maximum SNR beamforming. In: Proceedings of The IEEE International Symposium on Circuits and Systems (ISCAS’08), 3218–3221 (2008)
Kolossa, D., Klimas A. and Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: Proceedings of The IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 82–85 (2005)
Krueger, A. and Haeb-Umbach, R.: Model based feature enhancement for automatic speech recognition in reverberant environments. In: Proceedings of 10th European Conference on Speech Communication and Technology (Interspeech’09), 1231–1234 (2009)
Kuttruff, H.: Room Acoustics. 3rd ed. (Elsevier Science, London, 1991)
Liao, H. and Gales, M. J. F.: Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of 9th European Conference on Speech Communication and Technology (Interspeech’05-Eurospeech), 3129–3132 (2005)
Liao, H. and Gales, M. J. F.: Adapative training with joint uncertainty decoding for robust recognition of noisy data. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), 4, 389–392 (2007)
Meng, X.-L. and Rubin, D. B.: Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278 (1993)
Nakamura, S. and Nishiura, T.: RWCP sound scene database in real acoustical environments. http://tosa.mri.co.jp/sounddb/micarray/indexe.htm Cited 31 May 2010
Naylor, P. A. and Gaubitch, N. D.: Speech dereverberation. In: Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC’05), iwaenc05.ele.tue.nl/proceedings/papers/pt03.pdf (2005)
Paul, D. B. and Baker, J. M. : The design for the Wall Street Journal-based CSR corpus. In: Proceedings of the Workshop on Speech and Natural Language. 357–362 (1992)
Quatieri, T. F.: Discrete-Time Speech Signal Processing. (Prentice Hall, New Jersey, 2002)
Raj, B. and Stern, R. M.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine, 22 (5), 101–116 (2005)
Raut, C. K., Nishimoto, T. and Sagayama, S.: Model adaptation by state splitting of HMM for long reverberation. In: Proceedings of 9th European Conference on Speech Communication and Technology (Interspeech’05-Eurospeech), 277–280 (2005)
Rose, R. C., Hofstetter, E. M. and Reynolds, D. A.: Integrated models of signal and background with application to speaker identification in noise. IEEE Transactions on Speech and Audio Processing, 2(2), 245–257 (1994)
Sankar, A. and Lee C.-H.: Robust speech recognition based on stochastic matching. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95), 1, 121–125 (1995)
Sankar, A. and Lee, C.-H.: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202 (1996)
Schuller, B., Wollmer, M., Moosmayr, T. and Rigoll, G.: Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009, (2009)
Sehr, A. and Kellerman, W.: A new concept for feature-domain dereverberation for robust distant-talking ASR. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), 4, 369–372 (2007)
Sehr, A., Maas, R. and Kellerman, W.: Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, (To appear) (2010)
Stouten, V., Van hamme, H. and Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Communication, 48, 1502–1514 (2006)
Stouten, V., Van hamme, H. and Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement In: Proceedings of International Conferences on Spoken Language Processing (ICSLP’04), 105108 (2004)
Takiguchi, T. and Nishimura, M.: Acoustic model adaptation using first order prediction for reverberant speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), 1, 869–972 (2004)
Tashev, I. and Allred, D.: Reverberation reduction for improved speech recognition. In: Proceedings of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA’05), (2005)
Yoshioka, T.: Speech enhancement in reverberant environments. Ph.D. dissertation, Kyoto University (2010)
Wolfel, M.: Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 312–323 (2009)
Wu, M. and Wang, D.: A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14, 774–784 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Delcroix, M., Watanabe, S., Nakatani, T. (2011). Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-21317-5_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)