Skip to main content
Log in

Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Two main categories of speech recognition robustness through missing data are spectral imputation and classifier modification. In this paper, we introduce a novel technique that could combine methods from these two categories while improving the accuracy of the combined methods. Methods in these two categories are rarely employed together due to their incompatible structures. Based on our previous work, we propose a technique to solve the problem of incompatibility. The technique is based on the idea of partial restoration of the log-spectrum. We decide to whether restore or estimate a possible range for the missing component. We also propose a method to more effectively employ dynamic features. The combined techniques are a classic spectral imputation method and our previously proposed classifier modification technique, namely spectral variance learning. The experiments show that the proposed technique is able to improve the accuracies of both combined techniques significantly, leading to improvements in recognition accuracy as high as nearly four percent on Aurora 2.0 data and more than two percent on a noisy version of TIMIT data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. It is possible to employ SI in spectral domain, but the performance falls drastically.

  2. Soft mask estimation techniques give each part a number to indicate its reliability.

References

  1. R.K. Aggarwal, M. Dave, Recent trends in speech recognition systems, in Speech, Image, and Language Processing for Human Computer Interaction: Multi-modal Advancements, ed. by T.J. Siddiqui (International Science Reference, Hershey, Tiwary, U.S., 2012), pp. 101–127

  2. S. Ahmadi, S.M. Ahadi, B. Cranen, L. Boves, Sparse coding of the modulation spectrum for noise-robust automatic speech recognition. EURASIP J. Audio Speech Music Process. 36, 1–20 (2014)

    Google Scholar 

  3. R.F. Astudillo, D. Kolossa, P. Mandelartz, R. Orglmeister, An uncertainty propagation approach to robust ASR using the ETSI advanced front end. IEEE J. Sel. Top. Signal Process. 4, 824–833 (2010)

    Article  Google Scholar 

  4. R.F. Astudillo, R. Orglmeister, Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models. IEEE Trans. Audio Speech Lang. Process. 21, 1023–1034 (2013)

    Article  Google Scholar 

  5. B. Badiezadegan, R.C. Rose, A wavelet-based thresholding approach to reconstructing unreliable spectrogram components. Speech Commun. 67, 129–142 (2015)

    Article  Google Scholar 

  6. L. Barrault, C. Servan, D. Matrouf, G Linarès, R. De Mori, Frame-based acoustic feature integration for speech understanding, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, USA (2015), pp. 4997–5000

  7. C. Cerisara, Towards missing data recognition with cepstral features, in Proceedings of European Conference on Speech Communication and Technology—EUROSPEECH’03, Geneva, Switzerland (2003), pp. 3057–3060

  8. M. Cooke, P. Green, L. Josifovski, A. Vizinho, Robust ASR with unreliable data and minimal assumptions, in Proceedings of International Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland (1999), pp. 195–198

  9. M. Cooke, P. Green, L. Josifovski, A. Vizinho, Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34, 267–285 (2001)

    Article  MATH  Google Scholar 

  10. J. Droppo, L. Deng, A. Acero Evaluation of the SPLICE algorithm on the Aurora2 database, in Proceedings of EUROSPEECH, Aalborg, Denmark (2001), pp. 217–220

  11. J. Droppo, A. Acero, L. Deng, Uncertainty decoding with SPLICE for noise robust speech recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, USA (2003), pp. 57–60

  12. K. Ebrahim Kafoori, S.M. Ahadi, A novel classifier modification approach to missing data problem for noisy speech recognition, in Proceedings of International Symposium on Telecommunications (IST), Tehran, Iran (2014), pp. 458–463

  13. K. Ebrahim Kafoori, S.M. Ahadi, Bounded cepstral marginalization of missing data for robust speech recognition. Comput. Speech Lang. 36, 1–23 (2016)

    Article  Google Scholar 

  14. ETSI Standard, Extended advanced front-end feature extraction algorithm, ETSI ES 202 212, V1.1.1. (2003)

  15. G. Farahani, S.M. Ahadi, M.M. Homayounpour, Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition. Comput. Speech Lang. 21, 187–205 (2007)

    Article  Google Scholar 

  16. J.G. Fiscus, A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER), in Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, USA (1997), pp. 347–354

  17. S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)

    Article  Google Scholar 

  18. S. Furui, Toward robust speech recognition and understanding. J. VLSI Signal Process. Syst. Signal Image Video Technol. 41, 245–254 (2005)

    Article  Google Scholar 

  19. M.J.F. Gales, Model-based techniques for noise robust speech recognition. Ph.D. Dissertation, University of Cambridge, UK (1993)

  20. J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (Linguistic Data Consortium, Philadelphia, 1993)

    Book  Google Scholar 

  21. J.F. Gemmeke, H. Van Hamme, B. Cranen, L. Boves, Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J. Sel. Top. Signal Process. 4, 272–287 (2010)

    Article  Google Scholar 

  22. J.A. González, A.M. Peinado, N. Ma, A.M. Gómez, J. Barker, MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 21, 624–635 (2013)

    Article  Google Scholar 

  23. M.M. Goodarzi, F. Almasganj, S.M. Ahadi, Reconstructing missing speech spectral components using both temporal and statistical correlations, in Proceedings of International Conference on Information Sciences, Signal Processing and their Applications, (ISSPA), Kuala Lumpur, Malaysia (2010), pp. 125–128

  24. J. Hakkinen, H. Haverinen, On the use of missing feature theory with cepstral features, in proceedings of CRAC workshop, Aalborg, Denmark (2001)

  25. W. Hartmann, N. Narayanan, E. Fosler-Lussier, D. Wang, A direct masking approach to robust ASR. IEEE Trans. Audio Speech Lang. Process. 21, 1993–2005 (2013)

    Article  Google Scholar 

  26. H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)

    Article  Google Scholar 

  27. H.G Hirsch, D. Pearce, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Beijing, China (2000), pp. 29–32

  28. K. Jokinen, M. McTear, Spoken Dialogue Systems (Morgan and Claypool Publishers, San Rafael, 2010)

    Google Scholar 

  29. N. Joshi, L. Guan, Feature fusion applied to missing data ASR with the combination of recognizers. J. Signal Process. Syst. 58, 359–370 (2010)

    Article  Google Scholar 

  30. S. Keronen, H. Kallasjoki, U. Remes, G.J. Brown, J.F. Gemmeke, K.J. Palomäki, Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment. Comput. Speech Lang. 27, 798–819 (2013)

    Article  Google Scholar 

  31. D. Kolossa, R. Haeb-Umbach, Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications (Springer, Berlin, 2011)

    Book  MATH  Google Scholar 

  32. L. Kim, K. Kim, M. Hasegawa-Johnson, Robust automatic speech recognition with decoder oriented ideal binary mask estimation, in Proceedings of INTERSPEECH, Makuhari, Japan (2010), pp. 2066–2069

  33. B. Lecouteux, G. Linares, Y. Esteve, G. Gravier, Dynamic combination of automatic speech recognition systems by driven decoding. IEEE Trans. Audio Speech Lang. Process. 21, 1251–1260 (2013)

    Article  Google Scholar 

  34. P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, Georgia, USA (1996), pp. 733–736

  35. A. Neustein (ed.), Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics (Springer, New York, 2010)

    Google Scholar 

  36. B. Raj, M.L. Seltzer, R.M. Stern, Reconstruction of missing features for robust speech recognition. Speech Commun. 43, 275–296 (2004)

    Article  Google Scholar 

  37. B. Raj, R.M. Stern, Missing-feature approaches in speech recognition. IEEE Signal Process. Mag. 22, 101–116 (2005)

    Article  Google Scholar 

  38. R. Rasipuram, M. Magimai Doss, Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic (2011), pp. 5192–5195

  39. U. Remes, K.J. Palomäki, T. Raiko, A. Honkela, M. Kurimo, Missing-feature reconstruction with a bounded nonlinear state-space model. IEEE Signal Process. Lett. 18, 563–566 (2011)

    Article  Google Scholar 

  40. U. Remes, A. Ramirez Lopez, K. Palomaki, M. Kurimo, Bounded conditional mean imputation with observation uncertainties and acoustic model adaptation. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1198–1208 (2015)

    Article  Google Scholar 

  41. F. Seide, P. Zhao, On using missing-feature theory with cepstral features—approximations to the multivariate Integral, In: Proceedings of INTERSPEECH, Makuhari, Japan (2010), pp. 2094–2097

  42. P. Smaragdis, B. Raj, M. Shashanka, Missing data imputation for time-frequency representations of audio signals. J. Signal Process. Syst. 65, 361–370 (2011)

    Article  Google Scholar 

  43. S. Srinivasan, D. Wang, Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 2130–2140 (2007)

    Article  Google Scholar 

  44. S. Stüker, C. Fügen, S. Burger, M. Wölfel, Cross-system adaptation and combination for continuous speech recognition: the influence of phoneme set and acoustic front-end, in Proceedings of INTERSPEECH, Pittsburg, USA (2006), pp. 521-524

  45. Y. Sun, J.F. Gemmeke, B. Cranen, L. Bosch, L. Boves, Fusion of parametric and non-parametric approaches to noise-robust ASR. Speech Commun. 56, 49–62 (2014)

    Article  Google Scholar 

  46. D.T. Tran, E. Vincent, D. Jouvet, Noise Fusion of multiple uncertainty estimators and propagators for noise robust ASR, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy (2014), pp. 5512–5516

  47. D.T. Tran, E. Vincent, D. Jouvet, Nonparametric uncertainty estimation and propagation for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1835–1846 (2015)

    Article  Google Scholar 

  48. F. Valente, Multi-stream speech recognition based on Dempster-Shafer combination rule. Speech Commun. 52, 213–222 (2010)

    Article  Google Scholar 

  49. A.P. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12, 247–251 (1993)

    Article  Google Scholar 

  50. T. Virtanen, R. Singh, B. Raj, Techniques for Noise Robustness in Automatic Speech Recognition (Wiley, New Jersey, 2012)

    Book  Google Scholar 

  51. Y. Wang, J.F. Gemmeke, K. Demuynck, H. Van hamme, Missing data solutions for robust speech recognition, in Essential Speech and Language Technology for Dutch, pp. 289–304. Springer, Berlin (2013)

  52. Z. Xiaojia, S. Yang, W. DeLiang, CASA-based robust speaker identification. IEEE Trans. Audio Speech Lang. Process. 20, 1608–1616 (2012)

    Article  Google Scholar 

  53. P. Yi, Y. Ge, A weighted approach of missing data technique in cepstral domain based on S-function, in Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP), Saint-Malo, France (2010), pp. 19–23

  54. S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (Cambridge University Press, Cambridge, 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Mohammad Ahadi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ebrahim Kafoori, K., Ahadi, S.M. Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data. Circuits Syst Signal Process 37, 1625–1648 (2018). https://doi.org/10.1007/s00034-017-0616-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-017-0616-4

Keywords

Navigation