Skip to main content

A Model-Based Approach to Joint Compensation of Noise and Reverberation for Speech Recognition

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

Employing automatic speech recognition systems in hands-free communication applications is accompanied by perfomance degradation due to background noise and, in particular, due to reverberation. These two kinds of distortion alter the shape of the feature vector trajectory extracted from the microphone signal and consequently lead to a discrepancy between training and testing conditions for the recognizer. In this chapter we present a feature enhancement approach aiming at the joint compensation of noise and reverberation to improve the performance by restoring the training conditions. For the enhancement we concentrate on the logarithmic mel power spectral coefficients as features, which are computed at an intermediate stage to obtain the widely used mel frequency cepstral coefficients. The proposed technique is based on a Bayesian framework, to attempt to infer the posterior distribution of the clean features given the observation of all past corrupted features. It exploits information from a priori models describing the dynamics of clean speech and noise-only feature vector trajectories as well as from an observation model relating the reverberant noisy to the clean features. The observation model relies on a simplified stochastic model of the room impulse response (RIR) between the speaker and the microphone, having only two parameters, namely RIR energy and reverberation time, which can be estimated from the captured microphone signal. The performance of the proposed enhancement technique is finally experimentally studied by means of recognition accuracy obtained for a connected digits recognition task under different noise and reverberation conditions using the Aurora 5 database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Avargel, Y., Cohen, I.: On multiplicative transfer function approximation in the short-time Fourier transform domain. IEEE Signal Processing Letters 14(5), 337–340 (2007)

    Article  Google Scholar 

  2. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software. Wiley, New York (2001)

    Book  Google Scholar 

  3. Couvreur, L., Couvreur, C.: Blind model selection for automatic speech recognition in reverberant environments. Journal of VLSI Signal Processing 36(2/3), 189–203 (2004)

    Article  Google Scholar 

  4. Delcroix, M., Hikichi, T., Miyoshi, M.: On the use of lime dereverberation algorithm in an acoustic environment with a noise source. Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1, I–I (2006)

    Google Scholar 

  5. Delcroix, M., Nakatani, T., Watanabe, S.: Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing. IEEE Transactions on Audio, Speech, and Language Processing 17(2), 324–334 (2009)

    Article  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39(1), 1–38 (1977)

    Google Scholar 

  7. Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–953–6 vol.1 (2004)

    Google Scholar 

  8. ETSI: ETSI standard document, Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 V1.1.5 (2007-01)

    Google Scholar 

  9. ETSI: ETSI standard document, Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 201 108 V1.1.3 (2003-09)

    Google Scholar 

  10. Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12(2), 75–98 (1998)

    Article  Google Scholar 

  11. Gales, M.J.F., Woodland, P.C.: Mean and variance adaptation within the MLLR framework. Computer Speech and Language 10(4), 249–264 (1996)

    Article  Google Scholar 

  12. Gannot, S., Moonen, M.: Subspace methods for multi-microphone speech dereverberation. EURASIP Journal on Applied Signal Processing 11, 1074–1090 (2003)

    Google Scholar 

  13. Gürelli, M., Nikias, C.: EVAM: an eigenvector-based algorithm for multichannel blind deconvolution of input colored signals. IEEE Transactions on Signal Processing 43(1), 134–149 (1995)

    Article  Google Scholar 

  14. Habets, E.: Single- and multi-microphone speech dereverberation using spectral enhancement. Ph.D. thesis, Technische Universiteit Eindhoven (2007)

    Google Scholar 

  15. Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2(4), 578–589 (1994)

    Article  Google Scholar 

  16. Hirsch, H.: Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments. Tech. rep., Niederrhein University of Applied Sciences (2007)

    Google Scholar 

  17. Hirsch, H.G., Finster, H.: The simulation of realistic acoustic input scenarios for speech recognition systems. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech), pp. 2697–2700 (2005)

    Google Scholar 

  18. Hirsch, H.G., Finster, H.: A new approach for the adaptation of HMMs to reverberation and background noise. Speech Commununication 50(3), 244–263 (2008)

    Article  Google Scholar 

  19. Julier, S.J., Jeffrey, Uhlmann, K.: Unscented filtering and nonlinear estimation. In: Proceedings of the IEEE, pp. 401–422 (2004)

    Google Scholar 

  20. Kennedy, R., Radlovic, B.: Iterative cepstrum-based approach for speech dereverberation. In: Proc. of International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, pp. 55–58 vol.1 (1999)

    Google Scholar 

  21. Kingsbury, B.E.D., Morgan, N.: Recognizing reverberant speech with RASTA-PLP. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1259–1262 (1997)

    Google Scholar 

  22. Kinoshita, K., Delcroix, M., Nakatani, T., Miyoshi, M.: Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech, and Language Processing 17(4), 534–545 (2009)

    Article  Google Scholar 

  23. Krueger, A., Haeb-Umbach, R.: Model-based feature enhancement for reverberant speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 18(7), 1692–1707 (2010)

    Article  Google Scholar 

  24. Krueger, A., Leutnant, V., Haeb-Umbach, R., Marcel, A., Bloemer, J.: On the initialisation of dynamic models for speech features. In: Proc. of ITG Fachtagung Sprachkommunikation (2010)

    Google Scholar 

  25. Langhans, T., Strube, H.: Speech enhancement by nonlinear multiband envelope filtering. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 7, pp. 156–159 (1982)

    Google Scholar 

  26. Lebart, K., Boucher, J., Denbigh, P.: A new method based on spectral subtraction for speech dereverberation. Acta Acustica United with Acustica 87, 359–366(8) (2001)

    Google Scholar 

  27. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)

    Article  Google Scholar 

  28. Löllmann, H.W., Vary, P.: Low delay noise reduction and dereverberation for hearing aids. In: EURASIP Journal on Advances in Signal Processing (2009)

    Google Scholar 

  29. Murphy, K.: Switching Kalman filters. Tech. rep., U.C. Berkeley (1998)

    Google Scholar 

  30. Neely, S.T., Allen, J.B.: Invertibility of a room impulse response. Journal of the Acoustical Society of America 66(1), 165–169 (1979)

    Article  Google Scholar 

  31. Qian, S., Chen, D.: Discrete Gabor transform. IEEE Transactions on Signal Processing 41(7), 2429–2438 (1993)

    Article  MATH  Google Scholar 

  32. Ratnam, R., Jones, D., O’Brien W.D., J.: Fast algorithms for blind estimation of reverberation time. IEEE Signal Processing Letters 11(6), 537–540 (2004)

    Google Scholar 

  33. Ratnam, R., Jones, D.L., Wheeler, B.C., O’Brien, W.D., Lansing, C.R., Feng, A.S.: Blind estimation of reverberation time. Journal of the Acoustical Society of America 114(5), 2877–2892 (2003)

    Article  Google Scholar 

  34. Raut, C.K., Nishimoto, T., , Sagayama, S.: Model adaptation by state splitting of HMM for long reverberation. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech) (Sep 2005)

    Google Scholar 

  35. Rosenberg, A.E., Lee, C.H., Soong, F.K.: Cepstral channel normalization techniques for HMM-based speaker verification. In: Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 1835–1838 (1994)

    Google Scholar 

  36. Sehr, A., Kellerman, W.: A new concept for feature-domain dereverberation for robust distant-talking ASR. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. IV–369–IV–372 (2007)

    Google Scholar 

  37. Subramaniam, S., Petropulu, A., Wendt, C.: Cepstrum-based deconvolution for speech dereverberation. IEEE Transactions on Speech and Audio Processing 4(5), 392–396 (1996)

    Article  Google Scholar 

  38. Unoki, M., Sakata, K., Furukawa, M., Akagi, M.: A speech dereverberation method based on the MTF concept in power envelope restoration. Acoustical Science and Technology 25(4), 243–254 (2004)

    Article  Google Scholar 

  39. Wu, M., Wang, D.: A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing 14(3), 774–784 (2006)

    Article  Google Scholar 

  40. Yegnanarayana, B., Mahadeva Prasanna, S., Sreenivasa Rao, K.: Speech enhancement using excitation source information. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–541–I–544 (2002)

    Google Scholar 

  41. Yegnanarayana, B., Murthy, P.: Enhancement of reverberant speech using LP residual signal. IEEE Transactions on Speech and Audio Processing 8(3), 267–281 (2000)

    Article  Google Scholar 

  42. Yoshioka, T., Nakatani, T., Miyoshi, M.: Integrated speech enhancement method using noise suppression and dereverberation. IEEE Transactions on Audio, Speech, and Language Processing 17(2), 231–246 (2009)

    Article  Google Scholar 

  43. Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge, UK (2006)

    Google Scholar 

  44. Zhang, Z., Furui, S.: Piecewise-linear transformation-based hmm adaptation for noisy speech. Speech Commununication 42(1), 43–58 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Krueger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Krueger, A., Haeb-Umbach, R. (2011). A Model-Based Approach to Joint Compensation of Noise and Reverberation for Speech Recognition. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics