Skip to main content

Conditional Bayesian Estimation Employing a Phase-Sensitive Observation Model for Noise Robust Speech Recognition

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

In this contribution, conditional Bayesian estimation employing a phasesensitive observation model for noise robust speech recognition will be studied. After a review of speech recognition under the presence of corrupted features, termed uncertainty decoding, the estimation of the posterior distribution of the uncorrupted (clean) feature vector will be shown to be a key element of noise robust speech recognition. The estimation process will be based on three major components: an a priori model of the unobservable data, an observationmodel relating the unobservable data to the corrupted observation and an inference algorithm, finally allowing for a computationally tractable solution. Special stress will be laid on a detailed derivation of the phase-sensitive observation model and the required moments of the phase factor distribution. Thereby, it will not only be proven analytically that the phase factor distribution is non-Gaussian but also that all central moments can (approximately) be computed solely based on the used mel filter bank, finally rendering the moments independent of noise type and signal-to-noise ratio. The phase-sensitive observation model will then be incorporated into a modelbased feature enhancement scheme and recognition experiments will be carried out on the Aurora 2 and Aurora 4 databases. The importance of incorporating phase factor information into the enhancement scheme is pointed out by all recognition results. Application of the proposed scheme under the derived uncertainty decoding framework further leads to significant improvements in both recognition tasks, eventually reaching the performance achieved with the ETSI advanced front-end.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bar-Shalom, Y., Rong Li, X., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. John Wiley & Sons, Inc. (2001)

    Google Scholar 

  2. Bell, B., Cathey, F.: The iterated Kalman filter update as a Gauss-Newton method. IEEE Transactions on Automatic Control 38(2), 294–297 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  3. Brillinger, D.R.: Time Series: Data Analysis and Theory. Holt, Rinehart and Winston, Inc. (1975)

    MATH  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39(1), 1–38 (1977)

    Google Scholar 

  5. Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia (JMM) 2(2), 47–52 (2007)

    Google Scholar 

  6. Deng, L., Droppo, J., Acero, A.: Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing 12(2), 133–143 (2004)

    Article  Google Scholar 

  7. Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing 13(3), 412–421 (2005)

    Article  Google Scholar 

  8. Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: A. Acero (ed.) Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–953–6 vol.1. Montreal, Quebec, Canada (2004)

    Google Scholar 

  9. Droppo, J., Acero, A., Deng, L.: A nonlinear observation model for removing noise from corrupted speech log mel-spectral energies. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Denver, Colorado (2002)

    Google Scholar 

  10. Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with splice for noise robust speech recognition. In: A. Acero (ed.) Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–57–I–60 vol.1. Orlando, Florida (2002)

    Google Scholar 

  11. Droppo, J., Deng, L., Alex, A.: A comparison of three non-linear observation models for noisy speech features. In: Proc. Eurospeech, pp. 681–684. International Speech Communication Association, Geneva, Switzerland (2003)

    Google Scholar 

  12. ETSI ES 201 108: Speech processing, transmission and quality aspects; distributed speech recognition; front-end feature extraction algorithm; compression algorithms (2003)

    Google Scholar 

  13. ETSI ES 202 050: Speech processing, transmission and quality aspects; distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms (2007)

    Google Scholar 

  14. Faubel, F., McDonough, J., Klakow, D.: A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Interspeech, Brisbane, Australia (2008)

    Google Scholar 

  15. Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task AU/417/02. Tech. rep., STQ AURORA DSR WORKING GROUP (2002)

    Google Scholar 

  16. Ion, V., Haeb-Umbach, R.: Uncertainty decoding for distributed speech recognition over error-prone networks. Speech Commununication 48(11), 1435–1446 (2006)

    Article  Google Scholar 

  17. Ion, V., Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(5), 1047–1060 (2008)

    Article  Google Scholar 

  18. Isserlis, L.: On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika 12(1/2), 134–139 (1918)

    Article  Google Scholar 

  19. Kim, N.S., Lim, W., Stern, R.: Feature compensation based on switching linear dynamic model. IEEE Signal Processing Letters 12(6), 473–476 (2005)

    Article  Google Scholar 

  20. Krueger, A., Leutnant, V., Haeb-Umbach, R., Ackermann, M., Bloemer, J.: On the initialization of dynamic models for speech features. In: Proc. of ITG Fachtagung Sprachkommunikation. ITG, Bochum, Germany (2010)

    Google Scholar 

  21. Leonard, R.: A database for speaker independent digit recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 9, pp. 328–331. ICASSP, San Diego, California (1984)

    Google Scholar 

  22. Liao, H., Gales, M.: Issues with uncertainty decoding for noise robust automatic speech recognition. Speech Commununication 50(4), 265–277 (2008)

    Google Scholar 

  23. Martin, R., Lotter, T.: Optimal recursive smoothing of non-stationary periodograms. In: Proc. of International Workshop on Acoustic Echo and Noise Control (IWAENC). Darmstadt, Germany (2001)

    Google Scholar 

  24. Morris, A., Barker, J., Bourlard, H.: From missing data to maybe useful data: Soft data modelling for noise robust ASR. In: Proc. of International Workshop on Innovation in Speech Processing (WISP), 06. Stratford-upon-Avon, England (2001)

    Google Scholar 

  25. Murphy, K.P.: Switching Kalman filters. Tech. rep., U.C. Berkeley (1998)

    Google Scholar 

  26. Paul, D.B., Baker, J.M.: The design for the Wall Street Journal-based CSR corpus. In: HLT ’91: Proceedings of the workshop on Speech and Natural Language, pp. 357–362. Association for Computational Linguistics, Morristown, NJ, USA (1992)

    Google Scholar 

  27. Pearce, D., Hirsch, H.G.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Beijing, China (2000)

    Google Scholar 

  28. Stouten, V., Van hamme, H., Wambacq, P.: Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 433–436. Philadelphia, PA, USA (2005)

    Google Scholar 

  29. Stouten, V., Van hamme, H., Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Commununication 48(11), 1502–1514 (2006). Robustness Issues for Conversational Interaction

    Google Scholar 

  30. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book V3.4. Cambridge University Press, Cambridge, UK (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volker Leutnant .

Editor information

Editors and Affiliations

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Leutnant, V., Haeb-Umbach, R. (2011). Conditional Bayesian Estimation Employing a Phase-Sensitive Observation Model for Noise Robust Speech Recognition. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics