Skip to main content

Abstract

While it is often fairly straightforward to estimate the reliability of speech features in the time-frequency domain, this may not be true in other domains more amenable to speech recognition, such as for RASTA-PLP features or those obtained with the ETSI advanced front-end. In such cases, one useful approach is to estimate the uncertainties in the domain where noise reduction preprocessing is carried out, and to subsequently transform the uncertainties, along with the actual features, to the recognition domain. In order to develop suitable approaches, we will first give a short overview of relevant strategies for propagating probability distributions through nonlinearities. Secondly, for some feature domains suitable for robust recognition, we will show possible implementations and sensible approximations of uncertainty propagation and discuss the associated error margins and trade-offs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Astudillo, R.F.: Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition. Ph.D. thesis, Technical University Berlin (2010)

    Google Scholar 

  2. Astudillo, R.F., Kolossa, D., Mandelartz, P., Orglmeister, R.: An uncertainty propagation approach to robust ASR using the ETSI advanced front-end. IEEE Journal of Selected Topics in Signal Processing 4, 824 833 (2010)

    Google Scholar 

  3. Astudillo, R.F., Kolossa, D., Orglmeister, R.: Propagation of statistical information through non-linear feature extractions for robust speech recognition. In: Proc. MaxEnt 2007 (2007)

    Google Scholar 

  4. Astudillo, R.F., Kolossa, D., Orglmeister, R.: Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. In: Proc. Interspeech (2009)

    Google Scholar 

  5. Benítez, M.C., Segura, J.C., Torre, A., Ramírez, J., Rubio, A.: Including uncertainty of speech observations in robust speech recognition. In: Proc. International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 137–140 (2004)

    Google Scholar 

  6. Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(11), 2403 – 2418 (2001)

    Article  MATH  Google Scholar 

  7. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28 (4)(2), 357– 366 (1980)

    Google Scholar 

  8. Deller, J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signals. Prentice-Hall, Inc. (1987)

    Google Scholar 

  9. Deng, L., Droppo, J., Acero, A.: Exploiting variances in robust feature extraction based on a parametric model of speech distortion. In: Proc. International Conference on Spoken Language Processing (ICSLP) (2002)

    Google Scholar 

  10. Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)., vol. 1, pp. 57–60 (2002)

    Google Scholar 

  11. Ephraim, Y., Cohen, I.: Recent Advancements in Speech Enhancement, pp. 1–22. CRC Press (May 17, 2004)

    Google Scholar 

  12. Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Processing 32(6), 1109–1121 (1984)

    Google Scholar 

  13. ETSI: ETSI standard document, “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v1.1.5” (January 2007)

    Google Scholar 

  14. Gales, M.J.F.: Model-based technique for noise robust speech recognition. Ph.D. thesis, Gonville and Caius College (1995)

    Google Scholar 

  15. Gradshteyn, I.S., Ryzhik, I.: Table of Integrals, Series and Products. Elsevier (2007)

    MATH  Google Scholar 

  16. Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. on Speech and Audio Processing 2(4), 578–589 (1994). DOI 10.1109/89.326616

    Article  Google Scholar 

  17. Ion, V., Haeb-Umbach, R.: Improved source modeling and predictive classification for channel robust speech recognition. In: Proc. Interspeech (2006)

    Google Scholar 

  18. Johnson, N.L.: Continuous Univariate Distributions, Vol. 1. Wiley Interscience (1970)

    Google Scholar 

  19. Julier, S., Uhlmann, J.: A general method for approximating nonlinear transformations of probability distributions. Tech. rep., Dept. of Engineering Science, University of Oxford, Oxford, UK (1996)

    Google Scholar 

  20. Kolossa, D., Astudillo, R.F., Hoffmann, E., Orglmeister, R.: Independent component analysis and time-frequency masking for speech recognition in multi-talker conditions. EURASIP Journal on Audio, Speech, and Music Processing (2010)

    Google Scholar 

  21. Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85 (2005)

    Google Scholar 

  22. Kolossa, D., Sawada, H., Astudillo, R.F., Orglmeister, R., Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Proc. Asilomar Conference on Signals, Systems, and Computers, pp. 1397–1401 (2006)

    Google Scholar 

  23. Kuroiwa, S., Tsuge, S., Ren, F.: Blind equalization via minimization of VQ distortion for ETSI standard DSR front-end. In: Proc. International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 585–590 (2003). DOI 10.1109/NLPKE.2003.1275974

    Google Scholar 

  24. Liao, H., Gales, M.: Issues with uncertainty decoding for noise robust automatic speech recognition. Speech Communication 50(4), 265 – 277 (2008). DOI DOI:10.1016/j. specom.2007.10.004

    Google Scholar 

  25. McAulay, R., Malpass, M.: Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust., Speech, Signal Processing 28(2), 137–145 (1980)

    Google Scholar 

  26. Nikias, C.L., Petropulu, A.P.: Higher-Order Spectra Analysis: A Nonlinear Signal Processing Framework. Prentice Hall Signal Processing Series (1993)

    Google Scholar 

  27. Raj, B., Stern, R.: Reconstruction of missing features for robust speech recognition. Speech Communication 43(5), 275–296 (2004)

    Article  Google Scholar 

  28. Rice, S.O.: Mathematical Analysis of Random Noise, vol. 23. Bell Telephone Labs Inc. (1944)

    Google Scholar 

  29. Srinivasan, S., Wang, D.: A supervised learning approach to uncertainty decoding for robust speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I –I (2006)

    Google Scholar 

  30. Srinivasan, S., Wang, D.: Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio, Speech and Language Processing 15(7), 2130–2140 (2007)

    Google Scholar 

  31. Stouten, V., Van hamme, H., Wambacq, P.: Application of minimum statistics and minima controlled recursive averaging methods to estimate a cepstral noise model for robust ASR. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. I–I (2006). DOI 10.1109/ICASSP.2006.1660133

    Google Scholar 

  32. Stouten, V., Van hamme, H., Wambacq, W.: Model based feature enhancement with uncertainty decoding for noise robust ASR. Speech Communication. 48(11), 1502–1514 (2006)

    Google Scholar 

  33. Windmann, S., Haeb-Umbach, R.: Parameter estimation of a state-space model of noise for robust speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on 17(8), 1577 –1590 (2009)

    Article  Google Scholar 

  34. Yoma, N., McInnes, F., Jack, M.: Improving performance of spectral subtraction in speech recognition using a model for additive noise. IEEE Trans. Speech, Audio Processing 6 (6), 579–582 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramón Fernandez Astudillo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Astudillo, R.F., Kolossa, D. (2011). Uncertainty Propagation. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics