Skip to main content
Log in

Environment mismatch compensation using average eigenspace-based methods for robust speech recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The performance of speech recognition systems is adversely affected by mismatch between training and test conditions due to environmental factors. In addition to the case of test data from noisy environments, there are scenarios where the training data itself is noisy. In this study, we propose a series of methods for mismatch compensation between training and test environments, based on our “average eigenspace” approach. These methods are also shown to be effective for non-stationary mismatch conditions. An advantage is that there is no need for explicit adaptation data since the method is applied to incoming test data to find the compensatory transform. We evaluate these approaches on two separate corpora which are collected from realistic car environments: CU-Move and UTDrive. Compared with a baseline system incorporating spectral subtraction, highpass filtering and cepstral mean normalization, we obtain a relative word error rate reduction of 17–26 % by applying the proposed techniques. These methods also result in a dimensionality reduction of the feature vectors allowing for a more compact set of acoustic models in the phoneme space, a property important for automatic speech recognition for small footprint size mobile devices such as cell-phone or PDA’s which require ASR in diverse environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Abut, H., Hansen, J. H. L., & Takeda, K. (2004). DSP for In-Vehicle and Mobile Systems. New York: Springer.

    Google Scholar 

  • Abut, H., Hansen, J. H. L., & Takeda, K. (2006). Advances for In-Vehicle and Mobile Systems: Challenges for International Standards. New York: Springer.

    Google Scholar 

  • Angkititrakul, P., Hansen, J. H. L. (2008). In-Vehicle and Mobile Systems. In “UTDrive: The smart vehicle project”. New York: Springer.

  • Bou-Ghazale, S., & Hansen, J.H.L. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans on Speech and Audio Processing, 8, 429–442.

    Google Scholar 

  • Cardoso, J.-F., & Souloumiac, A. (1996). Jacobi angles for simultaneous diagonalization. SIAM Journal of Matrix Analysis and Application, 17(1), 161–164.

    Article  MathSciNet  MATH  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification (2nd ed.). Washington: DC: Wiley.

    Google Scholar 

  • Gales, M. J. F. (1998). Predictive model-based compensation schemes for robust speech recognition. Speech Communication, 25, 49–94.

    Article  Google Scholar 

  • Garofolo, J. S. (1993). Timit Acoustic-Phonetic Continuous Speech Corpus. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  • Hanai, N., & Stern, R. M. (1994). “Robust speech recognition in the automobile”. In Proceedings of the ICSLP, (pp. 1339–1342).

  • Hansen, J. H. L., & Bria, O. N. (1990) “Lombard effect compensation for robust automatic speech recognition in noise”. In Proceedings of the ICSLP, (pp. 1125–1128).

  • Hansen, J. H. L., Zhang, X. X., Akbacak, M., Yapanel, U., Pellom, B., Ward, W., & Angkititrakul, P. (2004).“CU-MOVE: Advanced in-vehicle speech systems for route navigation”. In DSP for In-Vehicle and Mobile Systems. New York: Springer.

  • Hansen, J. H. L., & Clements, M. A. (1991). Constrained iterative speech enhancement for speech recognition. IEEE Transactions on Signal Processing, 39(4), 795–805.

    Article  Google Scholar 

  • Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communications, 20, 151–170.

    Article  Google Scholar 

  • Hansen, J. H. L., Huang, R., Zhou, B., Seadle, M., Deller Jr, J. R., Gurijala, A.R., Angkititrakul, P. (2005). Speechfind: Advances in spoken document retrieval for a national gallery of the spoken word. IEEE Transactions on Speech and Audio Processing, 13, 712–730.

    Google Scholar 

  • Hermus, K., & Wambacq, P. (2004). “Assessment of signal subspace based speech enhancement for noise robust speech recognition”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (vol. 1, pp. 945–948).

  • Kim, W., & Hansen, J. H. L. (2009). Timefrequency correlation-based missing-feature reconstruction for robust speech recognition in band-restricted conditions. IEEE Transactions on Audio Speech and Language Processing, 17, 1292–1304.

    Article  Google Scholar 

  • Kumar, A., & Hansen, J. H. (2008). “Environment mismatch compensation using average eigenspaces for robust speech recognition”. In Proceedings of the Interspeech, (pp. 1277–1280).

  • Legetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9, 171–185.

    Article  Google Scholar 

  • Lockwood, P., Boudy, J., & Blanchet, M. (1992)“Non-linear spectral subtraction (NSS) and hidden markov models for robust speech recognition in car noise environments”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (pp. 265–268).

  • Moreno, P. J., Raj, B., & Stern, R. (1998). Data-driven environmental compensation for speech recognition: A unified approach. Speech Communication, 24, 267–285.

    Google Scholar 

  • Nguyen, P., Wellekens, C., & Junqua, J. (1999). “Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments”. In Proceedings of the EUROSPEECH (vol. 6, pp. 2519–2522).

  • Potamitis, L., Fakotakis, N., & Kokkinakis, G. (2000). Independent component analysis applied to feature extracton for robust automatic speech recognition. Electronic Letters, 36(23), 1977–1978.

    Google Scholar 

  • Raj, B., Seltzer, M. L., & Stern, R. M. (2004). Reconstruction of missing features for robust speech recognition. Speech Communication, 43, 275–296.

    Google Scholar 

  • Sankar, A., & Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202.

    Article  Google Scholar 

  • Takiguchi, T., & Ariki, Y. (2006). “Robust Feature Extraction using Kernel PCA”. In Proceedings of the ICASSP. Vetter, R., Virag, N., Renevey, P., & Vesin, J. M. (1999). “Single channel speech enhancement using principal component analysis and MDL subspace selection“. In Proceedings of the EUROSPEECH, (pp. 2411–2414). Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). “Speech Denoising using Nonnegative Matrix Factorization with Priors”. In Proceedings of the ICASSP, (pp. 4029–4032).

  • Takiguchi, T., & Ariki, Y. (2006). “Robust Feature Extraction using Kernel PCA”. In Proceedings of the ICASSP. Vetter, R., Virag, N., Renevey, P., & Vesin, J. M. (1999). “Single channel speech enhancement using principal component analysis and MDL subspace selection“. In Proceedings of the EUROSPEECH, (pp. 2411–2414). Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). “Speech Denoising using Nonnegative Matrix Factorization with Priors”. In Proceedings of the ICASSP, (pp. 4029–4032).

  • Yu, D., et al. (2008). “A minimum mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas:IEEE.

  • Zhang, X., & Hansen, J. H. L. (2003). CSA-BF: A constrained switched adaptive beamformer for speech enhancement and recognition in real car environments. IEEE Transactions on Speech and Audio Processing, 11(6), 733–745.

    Google Scholar 

  • Zhou, B., & Hansen, J. H. L. (2005). Rapid discriminative acoustic modeling based on eigenspace mapping for fast speaker adaptation. IEEE Transactions on Speech and Audio Processing, 13(4), 554–564.

    Article  Google Scholar 

  • Zou, X., Jancovic, P., & Liu, J. (2006). “The Effectiveness of ICA-based Representation: Application to Speech Feature Extraction for Noise Robust Speaker Recognition”. Proceedings of the EUSIPCO.

Download references

Acknowledgments

This study was funded by AFRL under contract FA8750-12-1-0188 and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J.H.L. Hansen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John H. L. Hansen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hansen, J.H.L., Kumar, A. & Angkititrakul, P. Environment mismatch compensation using average eigenspace-based methods for robust speech recognition. Int J Speech Technol 17, 353–364 (2014). https://doi.org/10.1007/s10772-014-9233-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9233-9

Keywords

Navigation