Skip to main content
Log in

Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Recently, the signals captured from a laser Doppler vibrometer (LDV) sensor have shown the noise robustness to automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. In this study, an alternative approach, namely concatenating the auxiliary features extracted from the LDV signal with the conventional acoustic features, is proposed to further improve ASR performance based on the deep neural network (DNN) for acoustic modeling. The preliminary experiments on a small set of stereo-data including both LDV and acoustic signals demonstrate its effectiveness. Thus, to leverage more existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features, which is well trained from a stereo-data set with a limited size and then used to generate pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments verify that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features can yield significant improvements of recognition performance over the system using purely acoustic features, in both quiet and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Baker, J.M., Deng, L., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., & O’Shaughnessy, D. (2009). IEEE Signal Processing Magazine, 26(3).

  2. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al. (2012). IEEE Signal Processing Magazine, 29(6), 82.

    Article  Google Scholar 

  3. Hinton, G.E. (2002). Neural Computation, 14(8), 1771.

    Article  Google Scholar 

  4. Hinton, G.E., Osindero, S., & Teh, Y.W. (2006). Neural Computation, 18(7), 1527.

    Article  MathSciNet  Google Scholar 

  5. Hinton, G. (2010). Momentum, 9(1), 926.

    Google Scholar 

  6. Han, K., He, Y., Bagchi, D., Fosler-Lussier, E., & Wang, D. (2015). In Proceedings of Interspeech (pp. 2484–2488).

  7. Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition, Vol. 201. Berlin: Springer.

    Book  Google Scholar 

  8. Gong, Y. (1995). Speech Communication, 16(3), 261.

    Article  Google Scholar 

  9. Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). IEEE/ACM Transactions on Audio Speech, and Language Processing, 22(4), 745.

    Article  Google Scholar 

  10. Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R., & Lee, C.H. (2014). In Proceedings of interspeech (pp. 616–620).

  11. Bagchi, D., Mandel, M.I., Wang, Z., He, Y., Plummer, A., & Fosler-Lussier, E. (2015). . In Proceedings IEEE ASRU.

  12. Kingsbury, B.E., Morgan, N., & Greenberg, S. (1998). Speech Communication, 25(1), 117.

    Article  Google Scholar 

  13. Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., & Maas, R. (2013). In 2013 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp. 1–4).

  14. Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., & Matassoni, M. (2013). In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 126– 130).

  15. Harper, M. (2015). In 2015 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 547–554).

  16. Dekens, T., Verhelst, W., Capman, F., & Beaugendre, F. (2010). In 2010 18th European signal processing conference (pp. 1978–1982): IEEE.

  17. Liu, Z., Zhang, Z., Acero, A., Droppo, J., & Huang, X. (2004). In 2004 IEEE 6th workshop on multimedia signal processing (pp. 363–366). IEEE.

  18. Radha, N., Shahina, A., Vinoth, G., & Khan, A.N. (2014). In 2014 international conference on control instrumentation, communication and computational technologies (ICCICCT) (pp. 1343–1348). IEEE.

  19. Breguet, J., Pellaux, J.P., & Gisin, N. (1994). In 10th optical fibre sensors conference (pp. 457–460). International Society for Optics and Photonics.

  20. De Paula, M., De Carvalho, A., Vinha, C., Cella, N., & Vargas, H. (1988). Journal of Applied Physics, 64(7), 3722.

    Article  Google Scholar 

  21. Graciarena, M., Franco, H., Sonmez, K., & Bratt, H. (2003). IEEE Signal Processing Letters, 10(3), 72.

    Article  Google Scholar 

  22. Goode, R.L., Ball, G., Nishihara, S., & Nakamura, K. (1996). Otology & Neurotology, 17(6), 813.

    Google Scholar 

  23. Avargel, Y., & Cohen, I. (2011). In 2011 joint workshop on hands-free speech communication and microphone arrays (HSCMA) (pp. 109–114). IEEE.

  24. Avargel, Y., Bakish, T., Dekel, A., Horovitz, G., Kurtz, Y., & Moyal, A. (2011). In Proceedings speech process, conference. Israel: Tel-Aviv.

  25. Xie, Z., Du, J., McLoughlin, I., Xu, Y., Ma, F., & Wang, H. (2016). In Proceedings of ISCSLP.

  26. Vass, J., Ṡmíd, R., Randall, R., Sovka, P., Cristalli, C., & Torcianti, B. (2008). Mechanical Systems and Signal Processing, 22(3), 647.

    Article  Google Scholar 

  27. Seltzer, M.L., Yu, D., & Wang, Y. (2013). In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7398–7402). IEEE.

  28. [online]. iflytek. http://www.iflytek.com/.

  29. Gao, T., Du, J., Dai, L.R., & Lee, C.H. (2015). In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE.

  30. [online]. vocalzoom. http://vocalzoom.com/.

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61671422 and U1613211, in part by the National Key Research and Development Program of China under Grant 2017YFB1002200, in part by the MOE-Microsoft Key Laboratory of USTC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Du.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Du, J., Xie, Z. et al. Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition. J Sign Process Syst 90, 975–983 (2018). https://doi.org/10.1007/s11265-017-1287-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-017-1287-x

Keywords

Navigation