EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting

Wei, Yungen; Gong, Zheng; Yang, Shunzhi; Ye, Kai; Wen, Yamin

doi:10.1007/s12652-021-03022-1

EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting

Original Research
Published: 14 March 2021

Volume 13, pages 1525–1535, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Yungen Wei^1,2,
Zheng Gong¹,
Shunzhi Yang^1,2,3,
Kai Ye^1,2,3 &
…
Yamin Wen³

683 Accesses
8 Citations
Explore all metrics

Abstract

Keyword Spotting (KWS) is a significant branch of Automatic Speech Recognition (ASR) and has been widely used in edge computing devices. The goal of KWS is to provide high accuracy with a low False Alarm Rate (FAR), while reducing the costs of memory, computation, and latency. However, limited resources are challenging for KWS applications on edge computing devices. Lightweight models and structures for deep learning have achieved good results in the KWS branch while maintaining efficient performances. In this paper, we present a new Convolutional Recurrent Neural Network (CRNN) architecture named EdgeCRNN for edge computing devices. EdgeCRNN, which is based on depthwise separable convolution and residual structure, uses a feature enhanced method. On the Google Speech Commands Dataset, the experimental results depict that EdgeCRNN can test 11.1 audio data per second on Raspberry Pi 3B+, which is 2.2 times than that of Tpool2. Compared with Tpool2, the accuracy of EdgeCRNN reaches 98.05% whilst its performance is also competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

CBAM: Convolutional Block Attention Module

A review on the long short-term memory model

Article 13 May 2020

References

Abdel-Hamid O, Ar Mohamed, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(10):1533–1545
Article Google Scholar
Anderson A, Su J, Dahyot R, Gregg D (2020) Performance-oriented neural architecture search. arXiv preprint arXiv:200102976
Arik SO, Kliegl M, Child R, Hestness J, Gibiansky A, Fougner C, Prenger R, Coates A (2017) Convolutional recurrent neural networks for small-footprint keyword spotting. arXiv preprint arXiv:170305390
Benelli G, Meoni G, Fanucci L (2018) A low power keyword spotting algorithm for memory constrained embedded systems. In: 2018 IFIP/IEEE international conference on very large scale integration (VLSI-SoC). IEEE, pp 267–272
Chen G, Parada C, Heigold G (2014) Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE international conference on acoustics. speech and signal processing (ICASSP). IEEE, pp 4087–4091
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078
Coucke A, Chlieh M, Gisselbrecht T, Leroy D, Poumeyrol M, Lavril T (2019) Efficient keyword spotting using dilated convolutions and gating. In: ICASSP 2019–2019 IEEE international conference on acoustics. speech and signal processing (ICASSP). IEEE, pp 6351–6355
Custers B, Sears AM, Dechesne F, Georgieva I, Tani T, van der Hof S (2019) EU personal data protection in policy and practice. Springer, Berlin
Book Google Scholar
Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 1597–1600
Dinelli G, Meoni G, Rapuano E, Benelli G, Fanucci L (2019) An FPGA-based hardware accelerator for CNNS using on-chip memories only: design and benchmarking with intel movidius neural compute stick. Int J Reconfigurable Comput 2019:7218758
Article Google Scholar
Du H, Li R, Kim D, Hirota K, Dai Y (2018) Low-latency convolutional recurrent neural network for keyword spotting. In: 2018 Joint 10th international conference on soft computing and intelligent systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS). IEEE, pp 802–807
Gaff BM, Sussman HE, Geetter J (2014) Privacy and big data. Computer 47(6):7–9
Article Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE international conference on computer vision, pp 1314–1324
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:150203167
Luo R, Sun T, Wang C, Du M, Tang Z, Zhou K, Gong X, Yang X (2019) Multi-layer attention mechanism for speech keyword recognition. arXiv preprint arXiv:190704536
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131
Mazzawi H, Gonzalvo X, Kracun A, Sridhar P, Subrahmanya N, Moreno IL, Park HJ, Violette P (2019) Improving keyword spotting and language identification via neural architecture search at scale. In: Proc Interspeech, vol 2019, pp 1278–1282
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8
Mishchenko Y, Goren Y, Sun M, Beauchene C, Matsoukas S, Rybakov O, Vitaladevuni SNP (2019) Low-bit quantization and quantization-aware training for small-footprint keyword spotting. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 706–711
Nakkiran P, Alvarez R, Prabhavalkar R and Parada C (2015) Compressing deep neural networks using a rank-constrained topology, In: Proceedings of annual conference of the international speech communication association (Interspeech). pp 1473–1477
Sainath TN, Parada C (2015) Convolutional neural networks for small-footprint keyword spotting. In: Proceeding of the Sixteenth Annual Conference of the International Speech Communication Association (Interspeech). pp 1478–1482
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Sifre L, Mallat S (2014) Rigid-motion scattering for image classification. Ph.D. Thesis
Silaghi MC (2005) Spotting subsequences matching an hmm using the average observation probability criteria with application to keyword spotting. In: AAAI, pp 1118–1123
Silaghi MC, Bourlard H (1999) Iterative posterior-based keyword spotting without filler models. In: Proceedings of the IEEE automatic speech recognition and understanding workshop. Citeseer, pp 213–216
Sun M, Raju A, Tucker G, Panchapagesan S, Fu G, Mandal A, Matsoukas S, Strom N, Vitaladevuni S (2016) Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting. In: 2016 IEEE spoken language technology workshop (SLT). IEEE, pp 474–480
Sun M, Snyder D, Gao Y, Nagaraja VK, Rodehorst M, Panchapagesan S, Strom N, Matsoukas S, Vitaladevuni S (2017) Compressed time delay neural network for small-footprint keyword spotting. In: INTERSPEECH, pp 3607–3611
Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV (2018) Resource-efficient neural architect. arXiv preprint arXiv:180607912
Tang R, Lin J (2018) Deep residual learning for small-footprint keyword spotting. 2018 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 5484–5488
Tang R, Wang W, Tu Z, Lin J (2018) An experimental analysis of the power consumption of convolutional neural networks for keyword spotting. In: 2018 IEEE international conference on acoustics. speech and signal processing (ICASSP). IEEE, pp 5479–5483
Tucker G, Wu M, Sun M, Panchapagesan S, Fu G, Vitaladevuni S (2016) Model compression applied to small-footprint keyword spotting. In: INTERSPEECH, pp 1878–1882
Véniat T, Schwander O, Denoyer L (2019) Stochastic adaptive neural architecture search for keyword spotting. In: ICASSP 2019–2019 IEEE international conference on acoustics. speech and signal processing (ICASSP). IEEE, pp 2842–2846
Warden P (2018) Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:180403209
Wilpon J, Miller L, Modi P (1991) Improvements and applications for key word recognition using hidden markov modeling techniques. In: 1991 international conference on acoustics, speech, and signal processing. IEEE, pp 309–312
Zeng M, Xiao N (2019) Effective combination of DenseNet and BiLSTM for keyword spotting. IEEE Access 7:10767–10775
Article Google Scholar
Zhang B, Li W, Li Q, Zhuang W, Chu X, Wang Y (2020) Autokws: keyword spotting with differentiable architecture search. arXiv preprint arXiv:200903658
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhang Y, Suda N, Lai L, Chandra V (2017) Hello edge: keyword spotting on microcontrollers. arXiv preprint arXiv:171107128

Download references

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Yungen Wei, Zheng Gong, Shunzhi Yang & Kai Ye
Computer Engineering Technical College, GuangDong Polytechnic of Science and Technology, Guangzhou, China
Yungen Wei, Shunzhi Yang & Kai Ye
School of Statistics and Mathematics, Guangdong University of Finance and Economics, Guangzhou, China
Shunzhi Yang, Kai Ye & Yamin Wen

Authors

Yungen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Gong
View author publications
You can also search for this author in PubMed Google Scholar
Shunzhi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yamin Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yamin Wen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper has been published at ML4CS 2020, Springer LNCS, this is the full-length version. This paper is supported by the National Natural Sciences Foundation of China (No. 62072192), National Cryptography Development Fund (No. MMJJ20180206), the Project of Science and Technology of Guangzhou (No. 201802010044), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011797), the Opening Project of GuangDong Province Key Laboratory of Information Security Technology(No. 2020B1212060078), the Project of Guangdong Province Innovative Team(2020WCXTD011) and the Research Team of Big Data Audit from Guangdong University of Finance and Economics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, Y., Gong, Z., Yang, S. et al. EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting. J Ambient Intell Human Comput 13, 1525–1535 (2022). https://doi.org/10.1007/s12652-021-03022-1

Download citation

Received: 12 October 2020
Accepted: 01 March 2021
Published: 14 March 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s12652-021-03022-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

CBAM: Convolutional Block Attention Module

A review on the long short-term memory model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

CBAM: Convolutional Block Attention Module

A review on the long short-term memory model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation