Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data

Ebrahim Kafoori, Kian; Ahadi, Seyed Mohammad

doi:10.1007/s00034-017-0616-4

Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data

Published: 31 July 2017

Volume 37, pages 1625–1648, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Kian Ebrahim Kafoori¹ &
Seyed Mohammad Ahadi¹

216 Accesses
3 Citations
Explore all metrics

Abstract

Two main categories of speech recognition robustness through missing data are spectral imputation and classifier modification. In this paper, we introduce a novel technique that could combine methods from these two categories while improving the accuracy of the combined methods. Methods in these two categories are rarely employed together due to their incompatible structures. Based on our previous work, we propose a technique to solve the problem of incompatibility. The technique is based on the idea of partial restoration of the log-spectrum. We decide to whether restore or estimate a possible range for the missing component. We also propose a method to more effectively employ dynamic features. The combined techniques are a classic spectral imputation method and our previously proposed classifier modification technique, namely spectral variance learning. The experiments show that the proposed technique is able to improve the accuracies of both combined techniques significantly, leading to improvements in recognition accuracy as high as nearly four percent on Aurora 2.0 data and more than two percent on a noisy version of TIMIT data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

It is possible to employ SI in spectral domain, but the performance falls drastically.
Soft mask estimation techniques give each part a number to indicate its reliability.

References

R.K. Aggarwal, M. Dave, Recent trends in speech recognition systems, in Speech, Image, and Language Processing for Human Computer Interaction: Multi-modal Advancements, ed. by T.J. Siddiqui (International Science Reference, Hershey, Tiwary, U.S., 2012), pp. 101–127
S. Ahmadi, S.M. Ahadi, B. Cranen, L. Boves, Sparse coding of the modulation spectrum for noise-robust automatic speech recognition. EURASIP J. Audio Speech Music Process. 36, 1–20 (2014)
Google Scholar
R.F. Astudillo, D. Kolossa, P. Mandelartz, R. Orglmeister, An uncertainty propagation approach to robust ASR using the ETSI advanced front end. IEEE J. Sel. Top. Signal Process. 4, 824–833 (2010)
Article Google Scholar
R.F. Astudillo, R. Orglmeister, Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models. IEEE Trans. Audio Speech Lang. Process. 21, 1023–1034 (2013)
Article Google Scholar
B. Badiezadegan, R.C. Rose, A wavelet-based thresholding approach to reconstructing unreliable spectrogram components. Speech Commun. 67, 129–142 (2015)
Article Google Scholar
L. Barrault, C. Servan, D. Matrouf, G Linarès, R. De Mori, Frame-based acoustic feature integration for speech understanding, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, USA (2015), pp. 4997–5000
C. Cerisara, Towards missing data recognition with cepstral features, in Proceedings of European Conference on Speech Communication and Technology—EUROSPEECH’03, Geneva, Switzerland (2003), pp. 3057–3060
M. Cooke, P. Green, L. Josifovski, A. Vizinho, Robust ASR with unreliable data and minimal assumptions, in Proceedings of International Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland (1999), pp. 195–198
M. Cooke, P. Green, L. Josifovski, A. Vizinho, Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34, 267–285 (2001)
Article MATH Google Scholar
J. Droppo, L. Deng, A. Acero Evaluation of the SPLICE algorithm on the Aurora2 database, in Proceedings of EUROSPEECH, Aalborg, Denmark (2001), pp. 217–220
J. Droppo, A. Acero, L. Deng, Uncertainty decoding with SPLICE for noise robust speech recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, USA (2003), pp. 57–60
K. Ebrahim Kafoori, S.M. Ahadi, A novel classifier modification approach to missing data problem for noisy speech recognition, in Proceedings of International Symposium on Telecommunications (IST), Tehran, Iran (2014), pp. 458–463
K. Ebrahim Kafoori, S.M. Ahadi, Bounded cepstral marginalization of missing data for robust speech recognition. Comput. Speech Lang. 36, 1–23 (2016)
Article Google Scholar
ETSI Standard, Extended advanced front-end feature extraction algorithm, ETSI ES 202 212, V1.1.1. (2003)
G. Farahani, S.M. Ahadi, M.M. Homayounpour, Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition. Comput. Speech Lang. 21, 187–205 (2007)
Article Google Scholar
J.G. Fiscus, A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER), in Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, USA (1997), pp. 347–354
S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)
Article Google Scholar
S. Furui, Toward robust speech recognition and understanding. J. VLSI Signal Process. Syst. Signal Image Video Technol. 41, 245–254 (2005)
Article Google Scholar
M.J.F. Gales, Model-based techniques for noise robust speech recognition. Ph.D. Dissertation, University of Cambridge, UK (1993)
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (Linguistic Data Consortium, Philadelphia, 1993)
Book Google Scholar
J.F. Gemmeke, H. Van Hamme, B. Cranen, L. Boves, Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J. Sel. Top. Signal Process. 4, 272–287 (2010)
Article Google Scholar
J.A. González, A.M. Peinado, N. Ma, A.M. Gómez, J. Barker, MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 21, 624–635 (2013)
Article Google Scholar
M.M. Goodarzi, F. Almasganj, S.M. Ahadi, Reconstructing missing speech spectral components using both temporal and statistical correlations, in Proceedings of International Conference on Information Sciences, Signal Processing and their Applications, (ISSPA), Kuala Lumpur, Malaysia (2010), pp. 125–128
J. Hakkinen, H. Haverinen, On the use of missing feature theory with cepstral features, in proceedings of CRAC workshop, Aalborg, Denmark (2001)
W. Hartmann, N. Narayanan, E. Fosler-Lussier, D. Wang, A direct masking approach to robust ASR. IEEE Trans. Audio Speech Lang. Process. 21, 1993–2005 (2013)
Article Google Scholar
H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)
Article Google Scholar
H.G Hirsch, D. Pearce, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Beijing, China (2000), pp. 29–32
K. Jokinen, M. McTear, Spoken Dialogue Systems (Morgan and Claypool Publishers, San Rafael, 2010)
Google Scholar
N. Joshi, L. Guan, Feature fusion applied to missing data ASR with the combination of recognizers. J. Signal Process. Syst. 58, 359–370 (2010)
Article Google Scholar
S. Keronen, H. Kallasjoki, U. Remes, G.J. Brown, J.F. Gemmeke, K.J. Palomäki, Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment. Comput. Speech Lang. 27, 798–819 (2013)
Article Google Scholar
D. Kolossa, R. Haeb-Umbach, Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications (Springer, Berlin, 2011)
Book MATH Google Scholar
L. Kim, K. Kim, M. Hasegawa-Johnson, Robust automatic speech recognition with decoder oriented ideal binary mask estimation, in Proceedings of INTERSPEECH, Makuhari, Japan (2010), pp. 2066–2069
B. Lecouteux, G. Linares, Y. Esteve, G. Gravier, Dynamic combination of automatic speech recognition systems by driven decoding. IEEE Trans. Audio Speech Lang. Process. 21, 1251–1260 (2013)
Article Google Scholar
P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, Georgia, USA (1996), pp. 733–736
A. Neustein (ed.), Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics (Springer, New York, 2010)
Google Scholar
B. Raj, M.L. Seltzer, R.M. Stern, Reconstruction of missing features for robust speech recognition. Speech Commun. 43, 275–296 (2004)
Article Google Scholar
B. Raj, R.M. Stern, Missing-feature approaches in speech recognition. IEEE Signal Process. Mag. 22, 101–116 (2005)
Article Google Scholar
R. Rasipuram, M. Magimai Doss, Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic (2011), pp. 5192–5195
U. Remes, K.J. Palomäki, T. Raiko, A. Honkela, M. Kurimo, Missing-feature reconstruction with a bounded nonlinear state-space model. IEEE Signal Process. Lett. 18, 563–566 (2011)
Article Google Scholar
U. Remes, A. Ramirez Lopez, K. Palomaki, M. Kurimo, Bounded conditional mean imputation with observation uncertainties and acoustic model adaptation. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1198–1208 (2015)
Article Google Scholar
F. Seide, P. Zhao, On using missing-feature theory with cepstral features—approximations to the multivariate Integral, In: Proceedings of INTERSPEECH, Makuhari, Japan (2010), pp. 2094–2097
P. Smaragdis, B. Raj, M. Shashanka, Missing data imputation for time-frequency representations of audio signals. J. Signal Process. Syst. 65, 361–370 (2011)
Article Google Scholar
S. Srinivasan, D. Wang, Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 2130–2140 (2007)
Article Google Scholar
S. Stüker, C. Fügen, S. Burger, M. Wölfel, Cross-system adaptation and combination for continuous speech recognition: the influence of phoneme set and acoustic front-end, in Proceedings of INTERSPEECH, Pittsburg, USA (2006), pp. 521-524
Y. Sun, J.F. Gemmeke, B. Cranen, L. Bosch, L. Boves, Fusion of parametric and non-parametric approaches to noise-robust ASR. Speech Commun. 56, 49–62 (2014)
Article Google Scholar
D.T. Tran, E. Vincent, D. Jouvet, Noise Fusion of multiple uncertainty estimators and propagators for noise robust ASR, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy (2014), pp. 5512–5516
D.T. Tran, E. Vincent, D. Jouvet, Nonparametric uncertainty estimation and propagation for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1835–1846 (2015)
Article Google Scholar
F. Valente, Multi-stream speech recognition based on Dempster-Shafer combination rule. Speech Commun. 52, 213–222 (2010)
Article Google Scholar
A.P. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12, 247–251 (1993)
Article Google Scholar
T. Virtanen, R. Singh, B. Raj, Techniques for Noise Robustness in Automatic Speech Recognition (Wiley, New Jersey, 2012)
Book Google Scholar
Y. Wang, J.F. Gemmeke, K. Demuynck, H. Van hamme, Missing data solutions for robust speech recognition, in Essential Speech and Language Technology for Dutch, pp. 289–304. Springer, Berlin (2013)
Z. Xiaojia, S. Yang, W. DeLiang, CASA-based robust speaker identification. IEEE Trans. Audio Speech Lang. Process. 20, 1608–1616 (2012)
Article Google Scholar
P. Yi, Y. Ge, A weighted approach of missing data technique in cepstral domain based on S-function, in Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP), Saint-Malo, France (2010), pp. 19–23
S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (Cambridge University Press, Cambridge, 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, Amirkabir University of Technology, Tehran, Iran
Kian Ebrahim Kafoori & Seyed Mohammad Ahadi

Authors

Kian Ebrahim Kafoori
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mohammad Ahadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seyed Mohammad Ahadi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ebrahim Kafoori, K., Ahadi, S.M. Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data. Circuits Syst Signal Process 37, 1625–1648 (2018). https://doi.org/10.1007/s00034-017-0616-4

Download citation

Received: 21 September 2016
Revised: 11 July 2017
Accepted: 12 July 2017
Published: 31 July 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s00034-017-0616-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation