Abstract
This paper proposes a novel compensation technique developed in the log-spectral domain. Our proposal consists in a minimum mean square error (MMSE) estimator derived from an occlusion model [1]. According to this model, the effect of noise over speech is simplified to a binary masking, so that the noise is completely masked by the speech when the speech power dominates and the other way round when the noise is dominant. As for many MMSE-based techniques, a statistical model of clean speech is required. A Gaussian mixture model is employed here. The resulting technique has clear similarities with missing-data imputation techniques although, unlike these ones, an explicit model of noise is employed by our proposal. The experimental results show the superiority of our MMSE estimator with respect to missing-data imputation with both binary and soft masks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Varga, A.P., Moore, R.K.: Hidden Markov model decomposition of speech and noise. In: Proc. ICASSP, pp. 845–848 (April 1990)
Huang, X., Acero, A., Hon, H.: Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall (2001)
Deng, L., Droppo, J., Acero, A.: Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features. IEEE Trans. Speech Audio Process. 12(3), 218–233 (2004)
Reddy, A.M., Raj, B.: Soft Mask Methods for Single-Channel Speaker Separation. IEEE Trans. Audio Speech and Language Process. 15(6), 1766–1776 (2007)
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable data. Speech Comm. 34(3), 267–285 (2001)
Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Comm. 48(4), 275–296 (2004)
González, J.A., Peinado, A.M., Gómez, A.M., Ma, N., Barker, J.: Combining missing-data reconstruction and uncertainty decoding for robust speech recognition. In: Proc. ICASSP, pp. 4693–4696 (March 2012)
Raj, B., Singh, R.: Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition. In: Proc. ASRU, pp. 275–296, 65–70 (2005)
Faubel, F., Raja, H., McDonough, J., Klakow, D.: Particle filter based soft-mask estimation for missing-feature reconstruction. In: Proc. IWAENC (2008)
Hirsch, H.G., Pearce, D.: The Aurora experimental framework for the performance evaluations of the speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, Paris, France (2000)
Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends of large vocabulary task. Tech. Rep., STQ AURORA DSR Working Group (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
González, J.A., Peinado, A.M., Gómez, Á.M. (2012). MMSE Feature Reconstruction Based on an Occlusion Model for Robust ASR. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-35292-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)