Skip to main content

Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this chapter, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortions, causing a static and dynamic mismatch between speech features and the acoustic model used for recognition. Acoustic model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This chapter introduces a novel model adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric representation of Gaussian variances of the acoustic model that includes static and dynamic components. Adaptive training is used to optimize the variances in order to realize an appropriate interconnection between dereverberation and a speech recognizer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arrowood, J. and Clements, M.: Using observation uncertainty in HMM decoding. In: Proceedings of International Conferences on Spoken Language Processing (ICSLP’02), 3, 1562–1564 (2002)

    Google Scholar 

  2. Astudillo, R. F., Kolossa, D. and Orglmeister, R.: Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. In: Proceedings of 10th European Conference on Speech Communication and Technology (Interspeech’09), 2491–2494 (2009)

    Google Scholar 

  3. Boll, S. F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120 (1979)

    Article  Google Scholar 

  4. Cooke, M. P., Green, P. D., Josifovski, L. B. and Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Communication, 34, 267–285 (2001)

    Article  MATH  Google Scholar 

  5. Couvreur, L. and Couvreur, C.: Blind model selection for automatic speech recognition in reverberant environments. Journal of VLSI Signal Processing Systems, 36(2–3), 189–203 (2004)

    Google Scholar 

  6. Delcroix, M., Nakatani, T. and Watanabe, S.: Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor. In: IEICE Technical Report, SP-105, 55–60 (2007)

    Google Scholar 

  7. Delcroix, M., Nakatani, T. and Watanabe, S.: Combined static and dynamic variance adaptation for efficient interconnection of a speech enhancement pre-processor with speech recognizer. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’08), 4073–4076 (2008)

    Google Scholar 

  8. Delcroix, M., Nakatani, T. and Watanabe, S.: Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 324–334 (2009)

    Article  Google Scholar 

  9. Deng, L., Droppo, J. and Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing, 13(3), 412–421 (2005)

    Article  Google Scholar 

  10. Droppo, J., Acero, A. and Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), 1, 57–60 (2002)

    Google Scholar 

  11. Gales, M. J. F. and Woodland, P. C.: Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10, 249–264 (1996)

    Article  Google Scholar 

  12. Gillespie, B. W. and Atlas, L. E.: Acoustic diversity for improved speech recognition in reverberant environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), 1, 557–600 (2002)

    Google Scholar 

  13. Gong, Y.: Speech recognition in noisy environments: A survey. Speech Communication, 16, 261–291 (1995)

    Article  Google Scholar 

  14. Hikichi, T., Delcroix, M. and Miyoshi, M.: Speech dereverberation algorithm using transfer function estimates with overestimated order. Acoustical Science and Technology, 27(1), 28–35 (2006)

    Article  Google Scholar 

  15. Hirsch, H. G. and Pearce, D.: The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy condition. In: Proceedings of The ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millenium (ITRW ASR2000), 18–20 (2000)

    Google Scholar 

  16. Hirsch, H. G. and Finster, H.: A new approach for the adaptation of HMMs to reverberation and background noise. Speech Communication, 50, 244–263 (2008)

    Article  Google Scholar 

  17. Hori, T., Hori, C., Minami, Y. and Nakamura, A.: Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 15 (4), 1352–1365 (2007)

    Article  Google Scholar 

  18. Huang, X., Acero, A. and Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, New-Jersey (2001)

    Google Scholar 

  19. Ion, V. and Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Speech and Audio Processing, 16 (5), 1047–1060 (2008)

    Article  Google Scholar 

  20. Kameoka, H., Nakatani, T. and Yoshioka, T.: Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’09), 45–48 (2009)

    Google Scholar 

  21. Kinoshita, K., Delcroix, M., Nakatani T. and Miyoshi, M.: Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 17 (4), 534–545 (2009)

    Article  Google Scholar 

  22. Kolossa, D., Sawada, H., Astudillo, R. F., Orglmeister, R. and Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Proceedings of The Asilomar Conference on Signals, Systems, and Computers (ACSSC’06), 1397–1401 (2006)

    Google Scholar 

  23. Kolossa, D., Araki, S., Delcroix, M., Nakatani, T., Orglmeister, R. and Makino, S.: Missing feature speech recognition in a meeting situation with maximum SNR beamforming. In: Proceedings of The IEEE International Symposium on Circuits and Systems (ISCAS’08), 3218–3221 (2008)

    Google Scholar 

  24. Kolossa, D., Klimas A. and Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: Proceedings of The IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 82–85 (2005)

    Google Scholar 

  25. Krueger, A. and Haeb-Umbach, R.: Model based feature enhancement for automatic speech recognition in reverberant environments. In: Proceedings of 10th European Conference on Speech Communication and Technology (Interspeech’09), 1231–1234 (2009)

    Google Scholar 

  26. Kuttruff, H.: Room Acoustics. 3rd ed. (Elsevier Science, London, 1991)

    Google Scholar 

  27. Liao, H. and Gales, M. J. F.: Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of 9th European Conference on Speech Communication and Technology (Interspeech’05-Eurospeech), 3129–3132 (2005)

    Google Scholar 

  28. Liao, H. and Gales, M. J. F.: Adapative training with joint uncertainty decoding for robust recognition of noisy data. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), 4, 389–392 (2007)

    Google Scholar 

  29. Meng, X.-L. and Rubin, D. B.: Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  30. Nakamura, S. and Nishiura, T.: RWCP sound scene database in real acoustical environments. http://tosa.mri.co.jp/sounddb/micarray/indexe.htm Cited 31 May 2010

  31. Naylor, P. A. and Gaubitch, N. D.: Speech dereverberation. In: Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC’05), iwaenc05.ele.tue.nl/proceedings/papers/pt03.pdf (2005)

    Google Scholar 

  32. Paul, D. B. and Baker, J. M. : The design for the Wall Street Journal-based CSR corpus. In: Proceedings of the Workshop on Speech and Natural Language. 357–362 (1992)

    Google Scholar 

  33. Quatieri, T. F.: Discrete-Time Speech Signal Processing. (Prentice Hall, New Jersey, 2002)

    Google Scholar 

  34. Raj, B. and Stern, R. M.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine, 22 (5), 101–116 (2005)

    Article  Google Scholar 

  35. Raut, C. K., Nishimoto, T. and Sagayama, S.: Model adaptation by state splitting of HMM for long reverberation. In: Proceedings of 9th European Conference on Speech Communication and Technology (Interspeech’05-Eurospeech), 277–280 (2005)

    Google Scholar 

  36. Rose, R. C., Hofstetter, E. M. and Reynolds, D. A.: Integrated models of signal and background with application to speaker identification in noise. IEEE Transactions on Speech and Audio Processing, 2(2), 245–257 (1994)

    Article  Google Scholar 

  37. Sankar, A. and Lee C.-H.: Robust speech recognition based on stochastic matching. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95), 1, 121–125 (1995)

    Google Scholar 

  38. Sankar, A. and Lee, C.-H.: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202 (1996)

    Article  Google Scholar 

  39. Schuller, B., Wollmer, M., Moosmayr, T. and Rigoll, G.: Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009, (2009)

    Google Scholar 

  40. Sehr, A. and Kellerman, W.: A new concept for feature-domain dereverberation for robust distant-talking ASR. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), 4, 369–372 (2007)

    Google Scholar 

  41. Sehr, A., Maas, R. and Kellerman, W.: Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, (To appear) (2010)

    Google Scholar 

  42. Stouten, V., Van hamme, H. and Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Communication, 48, 1502–1514 (2006)

    Google Scholar 

  43. Stouten, V., Van hamme, H. and Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement In: Proceedings of International Conferences on Spoken Language Processing (ICSLP’04), 105108 (2004)

    Google Scholar 

  44. Takiguchi, T. and Nishimura, M.: Acoustic model adaptation using first order prediction for reverberant speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), 1, 869–972 (2004)

    Google Scholar 

  45. Tashev, I. and Allred, D.: Reverberation reduction for improved speech recognition. In: Proceedings of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA’05), (2005)

    Google Scholar 

  46. Yoshioka, T.: Speech enhancement in reverberant environments. Ph.D. dissertation, Kyoto University (2010)

    Google Scholar 

  47. Wolfel, M.: Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 312–323 (2009)

    Article  Google Scholar 

  48. Wu, M. and Wang, D.: A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14, 774–784 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Delcroix .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Delcroix, M., Watanabe, S., Nakatani, T. (2011). Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics