Skip to main content

A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms

  • Conference paper
  • First Online:
Image Analysis and Recognition (ICIAR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11663))

Included in the following conference series:

  • 1541 Accesses

Abstract

Keyword spotting (KWS) is important in numerous trigger, trigger-command and command and control applications of embedded platforms. However, the embedded platforms used currently in the fast growing market of the Internet of Things (IoT) and in standalone systems have still considerable processing power, memory and battery constraints. In IoT and smart devices applications, speakers are usually far from the microphone resulting in severe distortions and considerable amounts of noise and noticeable reverberation. Speech enhancement can be used as a front-end or pre-processing module to improve the performance of the KWS. However, denoisers and dereverberators as front-end processing modules add to the complexity of the keyword spotting system and the computing, memory and battery requirements of the embedded platforms. In this paper, a noise robust keyword spotting engine with small memory footprint is presented. Multi-condition utterances training of a deep neural networks model is developed to increase the keyword spotting noise robustness. A comparative study is conducted to compare the deep learning approach with Gaussian mixture model. Experimental results show that deep learning outperforms the Gaussian approach in both clean and noisy conditions. Moreover, deep learning model trained using partially noisy data saves the need for using speech enhancement module or denoiser for front-end processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Seth, H., Kumar, P., Srivastava, M.M.: Prototypical metric transfer learning for continuous speech keyword spotting with limited training data. arXiv preprint arXiv:1901.03860 (2019)

  2. Mary, L., G, D.: Keyword spotting techniques. In: Searching Speech Databases. SST, pp. 45–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97761-4_4

    Chapter  Google Scholar 

  3. Moyal, A., Aharonson, V., Tetariy, E., Gishri, M.: Keyword spotting methods. In: Phonetic Search Methods for Large Speech Databases. SpringerBriefs in Electrical and Computer Engineering, pp. 7–11. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6489-1_2

    Chapter  Google Scholar 

  4. Chen, I.-F., Ni, C., Lim, B.P., Chen, N.F., Lee, C.-H.: A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search. In: 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 192–196. IEEE (2014)

    Google Scholar 

  5. Szoke, I., et al.: Comparison of keyword spotting approaches for informal continuous speech. In: Interspeech, pp. 633–636 (2005)

    Google Scholar 

  6. Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)

    Google Scholar 

  7. Moyal, A., Aharonson, V., Tetariy, E., Gishri, M.: Phonetic Search Methods for Large Speech Databases. Springer Science and Business Media, New York (2013)

    Book  Google Scholar 

  8. Alon, G.: Key-word spotting the base technology for speech analytics. Natural Speech Communications (2005)

    Google Scholar 

  9. Abdelmoula, R.: Noise robust keyword spotting using deep neural networks for embedded platforms. Master’s thesis, University of Waterloo (2016)

    Google Scholar 

  10. Ortega-Garcia, J., Gonzalez-Rodrguez, J.: Overview of speech enhancement techniques for automatic speaker recognition. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 1996, vol. 2, pp. 929–932. IEEE (1996)

    Google Scholar 

  11. Loizou, P.: Speech Enhancement: Theory and Practice. CRC Press, Boca Raton (2013)

    Book  Google Scholar 

  12. Yousefian, N., Loizou, P.C.: A dual-microphone speech enhancement algorithm based on the coherence function. IEEE Trans. Audio Speech Lang. Process. 20(2), 599–609 (2012)

    Google Scholar 

  13. Xu, Y., Du, J., Dai, L.-R., Lee, C.-H.: An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)

    Article  Google Scholar 

  14. Gemmeke, J.F., Virtanen, T., Hurmalainen, A.: Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition. In: International Workshop on Machine Listening in Multisource Environments, pp. 53–75 (2011)

    Google Scholar 

  15. Liu, D., Smaragdis, P., Kim, M.: Experiments on deep learning for speech denoising. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  16. Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: INTERSPEECH, pp. 436–440 (2013)

    Google Scholar 

  17. Cohen, I., Gannot, S.: Spectral enhancement methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 873–902. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_44

    Chapter  Google Scholar 

  18. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  19. Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7398–7402. IEEE (2013)

    Google Scholar 

  20. Virtanen, T., Singh, R., Raj, B.: Techniques for Noise Robustness in Automatic Speech Recognition. Wiley, Hoboken (2012)

    Google Scholar 

  21. Deng, L., Acero, L., Plumpe, M., Huang, X.: Large-vocabulary speech recognition under adverse acoustic environments. In: INTERSPEECH, pp. 806–809 (2000)

    Google Scholar 

  22. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  23. Janni, D.: Introduction to deep neural network (2015). http://derekjanni.github.io/Easy-Neural-Nets/

  24. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  25. Ciresan, D.C., Meier, U., Masci, J., Maria Gambardella, L., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1, p. 1237 (2011)

    Google Scholar 

  26. Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)

    Google Scholar 

  27. Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 317–324. IEEE (2010)

    Google Scholar 

  28. Uetz, R., Behnke, S.: Large-scale object recognition with cudaaccelerated hierarchical neural networks. In: IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, vol. 1, pp. 536–541. IEEE (2009)

    Google Scholar 

  29. Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and compact large vocabulary speech recognition on mobile devices. In: INTERSPEECH, pp. 662–665 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ramzi Abdelmoula or Alaa Khamis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abdelmoula, R., Khamis, A., Karray, F. (2019). A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms. In: Karray, F., Campilho, A., Yu, A. (eds) Image Analysis and Recognition. ICIAR 2019. Lecture Notes in Computer Science(), vol 11663. Springer, Cham. https://doi.org/10.1007/978-3-030-27272-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27272-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27271-5

  • Online ISBN: 978-3-030-27272-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics