MMSE Feature Reconstruction Based on an Occlusion Model for Robust ASR

González, José A.; Peinado, Antonio M.; Gómez, Ángel M.

doi:10.1007/978-3-642-35292-8_23

José A. González⁷,
Antonio M. Peinado⁷ &
Ángel M. Gómez⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

721 Accesses
1 Citations

Abstract

This paper proposes a novel compensation technique developed in the log-spectral domain. Our proposal consists in a minimum mean square error (MMSE) estimator derived from an occlusion model [1]. According to this model, the effect of noise over speech is simplified to a binary masking, so that the noise is completely masked by the speech when the speech power dominates and the other way round when the noise is dominant. As for many MMSE-based techniques, a statistical model of clean speech is required. A Gaussian mixture model is employed here. The resulting technique has clear similarities with missing-data imputation techniques although, unlike these ones, an explicit model of noise is employed by our proposal. The experimental results show the superiority of our MMSE estimator with respect to missing-data imputation with both binary and soft masks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Varga, A.P., Moore, R.K.: Hidden Markov model decomposition of speech and noise. In: Proc. ICASSP, pp. 845–848 (April 1990)
Google Scholar
Huang, X., Acero, A., Hon, H.: Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall (2001)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features. IEEE Trans. Speech Audio Process. 12(3), 218–233 (2004)
Article Google Scholar
Reddy, A.M., Raj, B.: Soft Mask Methods for Single-Channel Speaker Separation. IEEE Trans. Audio Speech and Language Process. 15(6), 1766–1776 (2007)
Article Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable data. Speech Comm. 34(3), 267–285 (2001)
Article MATH Google Scholar
Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Comm. 48(4), 275–296 (2004)
Article Google Scholar
González, J.A., Peinado, A.M., Gómez, A.M., Ma, N., Barker, J.: Combining missing-data reconstruction and uncertainty decoding for robust speech recognition. In: Proc. ICASSP, pp. 4693–4696 (March 2012)
Google Scholar
Raj, B., Singh, R.: Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition. In: Proc. ASRU, pp. 275–296, 65–70 (2005)
Google Scholar
Faubel, F., Raja, H., McDonough, J., Klakow, D.: Particle filter based soft-mask estimation for missing-feature reconstruction. In: Proc. IWAENC (2008)
Google Scholar
Hirsch, H.G., Pearce, D.: The Aurora experimental framework for the performance evaluations of the speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, Paris, France (2000)
Google Scholar
Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends of large vocabulary task. Tech. Rep., STQ AURORA DSR Working Group (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dpt. Teoría de la Señal, Telemática y Comunicaciones, Centro de Investigación en Tecnologías de la Información y de las Comunicaciones, 18071, Granada, Spain
José A. González, Antonio M. Peinado & Ángel M. Gómez

Authors

José A. González
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Peinado
View author publications
You can also search for this author in PubMed Google Scholar
Ángel M. Gómez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid. C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Doroteo Torre Toledano
Centro Politécnico Superior, Edificio Ada Byron, C/ María de Luna nº 1, 50018, Zaragoza, Spain
Alfonso Ortega Giménez
Universidade de Aveiro, Campus Universitário Aveiro, 3810-193, Aveiro, Portugal
António Teixeira
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Joaquín González Rodríguez
E.T.S.I.Telecomunicacion, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Hernández Gómez & Rubén San Segundo Hernández &
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Daniel Ramos Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González, J.A., Peinado, A.M., Gómez, Á.M. (2012). MMSE Feature Reconstruction Based on an Occlusion Model for Robust ASR. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-35292-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics