A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms

Abdelmoula, Ramzi; Khamis, Alaa; Karray, Fakhri

doi:10.1007/978-3-030-27272-2_12

Ramzi Abdelmoula¹¹,
Alaa Khamis¹¹ &
Fakhri Karray¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11663))

Included in the following conference series:

International Conference on Image Analysis and Recognition

1541 Accesses

Abstract

Keyword spotting (KWS) is important in numerous trigger, trigger-command and command and control applications of embedded platforms. However, the embedded platforms used currently in the fast growing market of the Internet of Things (IoT) and in standalone systems have still considerable processing power, memory and battery constraints. In IoT and smart devices applications, speakers are usually far from the microphone resulting in severe distortions and considerable amounts of noise and noticeable reverberation. Speech enhancement can be used as a front-end or pre-processing module to improve the performance of the KWS. However, denoisers and dereverberators as front-end processing modules add to the complexity of the keyword spotting system and the computing, memory and battery requirements of the embedded platforms. In this paper, a noise robust keyword spotting engine with small memory footprint is presented. Multi-condition utterances training of a deep neural networks model is developed to increase the keyword spotting noise robustness. A comparative study is conducted to compare the deep learning approach with Gaussian mixture model. Experimental results show that deep learning outperforms the Gaussian approach in both clean and noisy conditions. Moreover, deep learning model trained using partially noisy data saves the need for using speech enhancement module or denoiser for front-end processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Seth, H., Kumar, P., Srivastava, M.M.: Prototypical metric transfer learning for continuous speech keyword spotting with limited training data. arXiv preprint arXiv:1901.03860 (2019)
Mary, L., G, D.: Keyword spotting techniques. In: Searching Speech Databases. SST, pp. 45–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97761-4_4
Chapter Google Scholar
Moyal, A., Aharonson, V., Tetariy, E., Gishri, M.: Keyword spotting methods. In: Phonetic Search Methods for Large Speech Databases. SpringerBriefs in Electrical and Computer Engineering, pp. 7–11. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6489-1_2
Chapter Google Scholar
Chen, I.-F., Ni, C., Lim, B.P., Chen, N.F., Lee, C.-H.: A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search. In: 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 192–196. IEEE (2014)
Google Scholar
Szoke, I., et al.: Comparison of keyword spotting approaches for informal continuous speech. In: Interspeech, pp. 633–636 (2005)
Google Scholar
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)
Google Scholar
Moyal, A., Aharonson, V., Tetariy, E., Gishri, M.: Phonetic Search Methods for Large Speech Databases. Springer Science and Business Media, New York (2013)
Book Google Scholar
Alon, G.: Key-word spotting the base technology for speech analytics. Natural Speech Communications (2005)
Google Scholar
Abdelmoula, R.: Noise robust keyword spotting using deep neural networks for embedded platforms. Master’s thesis, University of Waterloo (2016)
Google Scholar
Ortega-Garcia, J., Gonzalez-Rodrguez, J.: Overview of speech enhancement techniques for automatic speaker recognition. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 1996, vol. 2, pp. 929–932. IEEE (1996)
Google Scholar
Loizou, P.: Speech Enhancement: Theory and Practice. CRC Press, Boca Raton (2013)
Book Google Scholar
Yousefian, N., Loizou, P.C.: A dual-microphone speech enhancement algorithm based on the coherence function. IEEE Trans. Audio Speech Lang. Process. 20(2), 599–609 (2012)
Google Scholar
Xu, Y., Du, J., Dai, L.-R., Lee, C.-H.: An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)
Article Google Scholar
Gemmeke, J.F., Virtanen, T., Hurmalainen, A.: Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition. In: International Workshop on Machine Listening in Multisource Environments, pp. 53–75 (2011)
Google Scholar
Liu, D., Smaragdis, P., Kim, M.: Experiments on deep learning for speech denoising. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: INTERSPEECH, pp. 436–440 (2013)
Google Scholar
Cohen, I., Gannot, S.: Spectral enhancement methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 873–902. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_44
Chapter Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Article Google Scholar
Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7398–7402. IEEE (2013)
Google Scholar
Virtanen, T., Singh, R., Raj, B.: Techniques for Noise Robustness in Automatic Speech Recognition. Wiley, Hoboken (2012)
Google Scholar
Deng, L., Acero, L., Plumpe, M., Huang, X.: Large-vocabulary speech recognition under adverse acoustic environments. In: INTERSPEECH, pp. 806–809 (2000)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Janni, D.: Introduction to deep neural network (2015). http://derekjanni.github.io/Easy-Neural-Nets/
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Ciresan, D.C., Meier, U., Masci, J., Maria Gambardella, L., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1, p. 1237 (2011)
Google Scholar
Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)
Google Scholar
Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 317–324. IEEE (2010)
Google Scholar
Uetz, R., Behnke, S.: Large-scale object recognition with cudaaccelerated hierarchical neural networks. In: IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, vol. 1, pp. 536–541. IEEE (2009)
Google Scholar
Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and compact large vocabulary speech recognition on mobile devices. In: INTERSPEECH, pp. 662–665 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Pattern Analysis and Machine Intelligence (CPAMI), University of Waterloo, Waterloo, ON, Canada
Ramzi Abdelmoula, Alaa Khamis & Fakhri Karray

Authors

Ramzi Abdelmoula
View author publications
You can also search for this author in PubMed Google Scholar
Alaa Khamis
View author publications
You can also search for this author in PubMed Google Scholar
Fakhri Karray
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ramzi Abdelmoula or Alaa Khamis .

Editor information

Editors and Affiliations

University of Waterloo, Waterloo, ON, Canada
Fakhri Karray
University of Porto, Porto, Portugal
Aurélio Campilho
University of Waterloo, Waterloo, ON, Canada
Alfred Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdelmoula, R., Khamis, A., Karray, F. (2019). A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms. In: Karray, F., Campilho, A., Yu, A. (eds) Image Analysis and Recognition. ICIAR 2019. Lecture Notes in Computer Science(), vol 11663. Springer, Cham. https://doi.org/10.1007/978-3-030-27272-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-27272-2_12
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27271-5
Online ISBN: 978-3-030-27272-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics