Abstract
While it is often fairly straightforward to estimate the reliability of speech features in the time-frequency domain, this may not be true in other domains more amenable to speech recognition, such as for RASTA-PLP features or those obtained with the ETSI advanced front-end. In such cases, one useful approach is to estimate the uncertainties in the domain where noise reduction preprocessing is carried out, and to subsequently transform the uncertainties, along with the actual features, to the recognition domain. In order to develop suitable approaches, we will first give a short overview of relevant strategies for propagating probability distributions through nonlinearities. Secondly, for some feature domains suitable for robust recognition, we will show possible implementations and sensible approximations of uncertainty propagation and discuss the associated error margins and trade-offs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Astudillo, R.F.: Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition. Ph.D. thesis, Technical University Berlin (2010)
Astudillo, R.F., Kolossa, D., Mandelartz, P., Orglmeister, R.: An uncertainty propagation approach to robust ASR using the ETSI advanced front-end. IEEE Journal of Selected Topics in Signal Processing 4, 824 833 (2010)
Astudillo, R.F., Kolossa, D., Orglmeister, R.: Propagation of statistical information through non-linear feature extractions for robust speech recognition. In: Proc. MaxEnt 2007 (2007)
Astudillo, R.F., Kolossa, D., Orglmeister, R.: Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. In: Proc. Interspeech (2009)
Benítez, M.C., Segura, J.C., Torre, A., Ramírez, J., Rubio, A.: Including uncertainty of speech observations in robust speech recognition. In: Proc. International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 137–140 (2004)
Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(11), 2403 – 2418 (2001)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28 (4)(2), 357– 366 (1980)
Deller, J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signals. Prentice-Hall, Inc. (1987)
Deng, L., Droppo, J., Acero, A.: Exploiting variances in robust feature extraction based on a parametric model of speech distortion. In: Proc. International Conference on Spoken Language Processing (ICSLP) (2002)
Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)., vol. 1, pp. 57–60 (2002)
Ephraim, Y., Cohen, I.: Recent Advancements in Speech Enhancement, pp. 1–22. CRC Press (May 17, 2004)
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Processing 32(6), 1109–1121 (1984)
ETSI: ETSI standard document, “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v1.1.5” (January 2007)
Gales, M.J.F.: Model-based technique for noise robust speech recognition. Ph.D. thesis, Gonville and Caius College (1995)
Gradshteyn, I.S., Ryzhik, I.: Table of Integrals, Series and Products. Elsevier (2007)
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. on Speech and Audio Processing 2(4), 578–589 (1994). DOI 10.1109/89.326616
Ion, V., Haeb-Umbach, R.: Improved source modeling and predictive classification for channel robust speech recognition. In: Proc. Interspeech (2006)
Johnson, N.L.: Continuous Univariate Distributions, Vol. 1. Wiley Interscience (1970)
Julier, S., Uhlmann, J.: A general method for approximating nonlinear transformations of probability distributions. Tech. rep., Dept. of Engineering Science, University of Oxford, Oxford, UK (1996)
Kolossa, D., Astudillo, R.F., Hoffmann, E., Orglmeister, R.: Independent component analysis and time-frequency masking for speech recognition in multi-talker conditions. EURASIP Journal on Audio, Speech, and Music Processing (2010)
Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85 (2005)
Kolossa, D., Sawada, H., Astudillo, R.F., Orglmeister, R., Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Proc. Asilomar Conference on Signals, Systems, and Computers, pp. 1397–1401 (2006)
Kuroiwa, S., Tsuge, S., Ren, F.: Blind equalization via minimization of VQ distortion for ETSI standard DSR front-end. In: Proc. International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 585–590 (2003). DOI 10.1109/NLPKE.2003.1275974
Liao, H., Gales, M.: Issues with uncertainty decoding for noise robust automatic speech recognition. Speech Communication 50(4), 265 – 277 (2008). DOI DOI:10.1016/j. specom.2007.10.004
McAulay, R., Malpass, M.: Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust., Speech, Signal Processing 28(2), 137–145 (1980)
Nikias, C.L., Petropulu, A.P.: Higher-Order Spectra Analysis: A Nonlinear Signal Processing Framework. Prentice Hall Signal Processing Series (1993)
Raj, B., Stern, R.: Reconstruction of missing features for robust speech recognition. Speech Communication 43(5), 275–296 (2004)
Rice, S.O.: Mathematical Analysis of Random Noise, vol. 23. Bell Telephone Labs Inc. (1944)
Srinivasan, S., Wang, D.: A supervised learning approach to uncertainty decoding for robust speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I –I (2006)
Srinivasan, S., Wang, D.: Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio, Speech and Language Processing 15(7), 2130–2140 (2007)
Stouten, V., Van hamme, H., Wambacq, P.: Application of minimum statistics and minima controlled recursive averaging methods to estimate a cepstral noise model for robust ASR. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. I–I (2006). DOI 10.1109/ICASSP.2006.1660133
Stouten, V., Van hamme, H., Wambacq, W.: Model based feature enhancement with uncertainty decoding for noise robust ASR. Speech Communication. 48(11), 1502–1514 (2006)
Windmann, S., Haeb-Umbach, R.: Parameter estimation of a state-space model of noise for robust speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on 17(8), 1577 –1590 (2009)
Yoma, N., McInnes, F., Jack, M.: Improving performance of spectral subtraction in speech recognition using a model for additive noise. IEEE Trans. Speech, Audio Processing 6 (6), 579–582 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Astudillo, R.F., Kolossa, D. (2011). Uncertainty Propagation. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-21317-5_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)