Uncertainty Propagation

Astudillo, Ramón Fernandez; Kolossa, Dorothea

doi:10.1007/978-3-642-21317-5_3

Ramón Fernandez Astudillo³ &
Dorothea Kolossa³

937 Accesses

Abstract

While it is often fairly straightforward to estimate the reliability of speech features in the time-frequency domain, this may not be true in other domains more amenable to speech recognition, such as for RASTA-PLP features or those obtained with the ETSI advanced front-end. In such cases, one useful approach is to estimate the uncertainties in the domain where noise reduction preprocessing is carried out, and to subsequently transform the uncertainties, along with the actual features, to the recognition domain. In order to develop suitable approaches, we will first give a short overview of relevant strategies for propagating probability distributions through nonlinearities. Secondly, for some feature domains suitable for robust recognition, we will show possible implementations and sensible approximations of uncertainty propagation and discuss the associated error margins and trade-offs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

Fuzzy Logic in Speech Technology - Introductory and Overviewing Glimpses

Overview of Standard Methods

References

Astudillo, R.F.: Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition. Ph.D. thesis, Technical University Berlin (2010)
Google Scholar
Astudillo, R.F., Kolossa, D., Mandelartz, P., Orglmeister, R.: An uncertainty propagation approach to robust ASR using the ETSI advanced front-end. IEEE Journal of Selected Topics in Signal Processing 4, 824 833 (2010)
Google Scholar
Astudillo, R.F., Kolossa, D., Orglmeister, R.: Propagation of statistical information through non-linear feature extractions for robust speech recognition. In: Proc. MaxEnt 2007 (2007)
Google Scholar
Astudillo, R.F., Kolossa, D., Orglmeister, R.: Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. In: Proc. Interspeech (2009)
Google Scholar
Benítez, M.C., Segura, J.C., Torre, A., Ramírez, J., Rubio, A.: Including uncertainty of speech observations in robust speech recognition. In: Proc. International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 137–140 (2004)
Google Scholar
Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(11), 2403 – 2418 (2001)
Article MATH Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28 (4)(2), 357– 366 (1980)
Google Scholar
Deller, J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signals. Prentice-Hall, Inc. (1987)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Exploiting variances in robust feature extraction based on a parametric model of speech distortion. In: Proc. International Conference on Spoken Language Processing (ICSLP) (2002)
Google Scholar
Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)., vol. 1, pp. 57–60 (2002)
Google Scholar
Ephraim, Y., Cohen, I.: Recent Advancements in Speech Enhancement, pp. 1–22. CRC Press (May 17, 2004)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Processing 32(6), 1109–1121 (1984)
Google Scholar
ETSI: ETSI standard document, “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v1.1.5” (January 2007)
Google Scholar
Gales, M.J.F.: Model-based technique for noise robust speech recognition. Ph.D. thesis, Gonville and Caius College (1995)
Google Scholar
Gradshteyn, I.S., Ryzhik, I.: Table of Integrals, Series and Products. Elsevier (2007)
MATH Google Scholar
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. on Speech and Audio Processing 2(4), 578–589 (1994). DOI 10.1109/89.326616
Article Google Scholar
Ion, V., Haeb-Umbach, R.: Improved source modeling and predictive classification for channel robust speech recognition. In: Proc. Interspeech (2006)
Google Scholar
Johnson, N.L.: Continuous Univariate Distributions, Vol. 1. Wiley Interscience (1970)
Google Scholar
Julier, S., Uhlmann, J.: A general method for approximating nonlinear transformations of probability distributions. Tech. rep., Dept. of Engineering Science, University of Oxford, Oxford, UK (1996)
Google Scholar
Kolossa, D., Astudillo, R.F., Hoffmann, E., Orglmeister, R.: Independent component analysis and time-frequency masking for speech recognition in multi-talker conditions. EURASIP Journal on Audio, Speech, and Music Processing (2010)
Google Scholar
Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85 (2005)
Google Scholar
Kolossa, D., Sawada, H., Astudillo, R.F., Orglmeister, R., Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Proc. Asilomar Conference on Signals, Systems, and Computers, pp. 1397–1401 (2006)
Google Scholar
Kuroiwa, S., Tsuge, S., Ren, F.: Blind equalization via minimization of VQ distortion for ETSI standard DSR front-end. In: Proc. International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 585–590 (2003). DOI 10.1109/NLPKE.2003.1275974
Google Scholar
Liao, H., Gales, M.: Issues with uncertainty decoding for noise robust automatic speech recognition. Speech Communication 50(4), 265 – 277 (2008). DOI DOI:10.1016/j. specom.2007.10.004
Google Scholar
McAulay, R., Malpass, M.: Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust., Speech, Signal Processing 28(2), 137–145 (1980)
Google Scholar
Nikias, C.L., Petropulu, A.P.: Higher-Order Spectra Analysis: A Nonlinear Signal Processing Framework. Prentice Hall Signal Processing Series (1993)
Google Scholar
Raj, B., Stern, R.: Reconstruction of missing features for robust speech recognition. Speech Communication 43(5), 275–296 (2004)
Article Google Scholar
Rice, S.O.: Mathematical Analysis of Random Noise, vol. 23. Bell Telephone Labs Inc. (1944)
Google Scholar
Srinivasan, S., Wang, D.: A supervised learning approach to uncertainty decoding for robust speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I –I (2006)
Google Scholar
Srinivasan, S., Wang, D.: Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio, Speech and Language Processing 15(7), 2130–2140 (2007)
Google Scholar
Stouten, V., Van hamme, H., Wambacq, P.: Application of minimum statistics and minima controlled recursive averaging methods to estimate a cepstral noise model for robust ASR. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. I–I (2006). DOI 10.1109/ICASSP.2006.1660133
Google Scholar
Stouten, V., Van hamme, H., Wambacq, W.: Model based feature enhancement with uncertainty decoding for noise robust ASR. Speech Communication. 48(11), 1502–1514 (2006)
Google Scholar
Windmann, S., Haeb-Umbach, R.: Parameter estimation of a state-space model of noise for robust speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on 17(8), 1577 –1590 (2009)
Article Google Scholar
Yoma, N., McInnes, F., Jack, M.: Improving performance of spectral subtraction in speech recognition using a model for additive noise. IEEE Trans. Speech, Audio Processing 6 (6), 579–582 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Electronics and Medical Signal Processing Group, TU Berlin, 10587, Berlin, Germany
Ramón Fernandez Astudillo & Dorothea Kolossa

Authors

Ramón Fernandez Astudillo
View author publications
You can also search for this author in PubMed Google Scholar
Dorothea Kolossa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramón Fernandez Astudillo .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Astudillo, R.F., Kolossa, D. (2011). Uncertainty Propagation. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_3
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics