Skip to main content

Feature Space VTS with Phase Term Modeling

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

Abstract

A new variant of Vector Taylor Series based features compensation algorithm is proposed. The phase-sensitive speech distortion model is used and the phase term is modeled as a multivariate gaussian with unknown mean vector and covariance matrix. These parameters are estimated based on Maximum Likelihood principle and EM-algorithm is used for this. EM formulas of parameter update are derived as well MMSE estimate of the clean speech features. The experiments on Aurora2 database show that taking phase term into account and data-driven estimation of its parameters result in relative WER reduction of about 20 % compared to phase-insensitive VTS version. The proposed method is also compared to the VTS with constant phase vector and this approximation is shown to be very efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In what follows we deal with only mel-cepstral domain since log-spectral domain is similar but more simple to explore.

  2. 2.

    The equation (13) holds when noise and phase vectors are considered as independent. Otherwise the additional cross-correlation term appears.

References

  1. Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural network concepts to hybrid nn-hmm model for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4277–4280 (2012)

    Google Scholar 

  2. Acero, A., Deng, L., Kristjansson, T., Zhang, J.: Hmm adaptation using vector taylor series for noisy speech recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 869–872 (2000)

    Google Scholar 

  3. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  4. Deng, L., Droppo, J., Acero, A.: Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Trans. Speech Audio Process. 12(2), 133–143 (2004)

    Article  Google Scholar 

  5. Gales, M., Flego, F.: Discriminative classifiers with adaptive kernels for noise robust speech recognition. Comput. Speech Lang. 24, 648–662 (2014)

    Article  Google Scholar 

  6. Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional lstm. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278 (2013)

    Google Scholar 

  7. Hirsch, H., Pearce, D.: The aurora experimental framework for the performance evaluations of speech recognition systems under noisy conditions. In: Proceedings of ISCA ITRWASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium (2000)

    Google Scholar 

  8. Hu, Y., Huo, Q.: Irrelevant variability normalization based hmm training using vts approximation of an explicit model of environmental distortions. In: Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), pp. 1042–1045 (2007)

    Google Scholar 

  9. Kalinli, O., Seltzer, M., Droppo, J., Acero, A.: Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 18(8), 1889–1901 (2010)

    Article  Google Scholar 

  10. Kim, D., Un, C., Kim, N.: Speech recognition in noisy environments using first-order vector taylor series. Speech Commun. 24, 39–49 (1998)

    Article  Google Scholar 

  11. Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: High-performance hmm adaptation with joint compensation of additive and convolutive distortions via vector taylor series. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 65–70 (2007)

    Google Scholar 

  12. Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: Efficient vts adaptation using jacobian approximation. In: Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), pp. 1906–1909 (2012)

    Google Scholar 

  13. Li, J., Seltzer, M., Gong, Y.: A unified framework of hmm adaptation with joint compensation of additive and convolutive distortions. Computer Speech Lang. 23, 389–405 (2009)

    Article  Google Scholar 

  14. Li, J., Seltzer, M., Gong, Y.: Improvements to vts feature enhancement. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4677–4680 (2012)

    Google Scholar 

  15. Liao, H.: Uncertainty Decoding for Noise Robust Speech Recognition. Ph.D. thesis, Sidney Sussex College University of Cambridge (2007)

    Google Scholar 

  16. Liao, H., Gales, M.: Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), pp. 1042–1045 (2005)

    Google Scholar 

  17. Liao, H., Gales, M.: Joint uncertainty decoding for robust large vocabulary speechrecognition. Technical report, Cambridge University Engeneering Department (2006)

    Google Scholar 

  18. Moreno, P.: Speech Recognition in Noisy Environments. Ph.D. thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University (1996)

    Google Scholar 

  19. Moreno, P., Raj, B., Stern, R.: A vector taylor series approach for environment-independent speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). vol. 2, pp. 733–736 (1996)

    Google Scholar 

  20. Paalanen, P., Kämäräinen, J., Kälviäinen, H.: Gmmbayes toolkit. http://www.it.lut.fi/project/gmmbayes

  21. Stouten, V. Van hamme, H., Demuynck, K., Wambacq, P.: Robust speech recognition using model-based feature enhancement. In: Proceedings of 4th Annual Conference of the International Speech Communication Association (Interspeech), pp. 17–20 (2003)

    Google Scholar 

  22. Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press, Cambridge (2006)

    Google Scholar 

Download references

Acknowledgments

This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0033 (ID RFMEFI57514X0033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxim Korenevsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Korenevsky, M., Romanenko, A. (2016). Feature Space VTS with Phase Term Modeling. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43958-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43957-0

  • Online ISBN: 978-3-319-43958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics