ABSTRACT
Keyword spotting (KWS) is a crucial front-end module in the whole speech interaction system. The always-on KWS module detects input words, then activates the energy-consuming complex backend system when keywords are detected. The performance of the KWS determines the standby performance of the whole system and the conventional KWS module encounters the power consumption bottleneck problem of the data conversion near the microphone sensor. In this paper, we propose an energy-efficient near-sensor processing architecture for always-on KWS, which could enhance continuous perception of the whole speech interaction system. By implementing the keyword detection in the analog domain after the microphone sensor, this architecture avoids energy-consuming data converter and achieves faster speed than conventional realizations. In addition, we propose a lightweight gated recurrent unit (GRU) with negligible accuracy loss to ensure the recognition performance. We also implement and fabricate the proposed KWS system with the CMOS 0.18μm process. In the system-view evaluation results, the hardware-software co-design architecture achieves 65.6% energy consumption saving and 71 times speed up than state of the art.
Supplemental Material
- Stephen Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1--122.Google Scholar
- Fernando Cardes, Gutierrez, et al. 2018. 0.04-mm 2 103-dB-A Dynamic Range Second-Order VCO-Based Audio Sigma-Delta ADC in 0.13μm CMOS. IEEE Journal of Solid-State Circuits 53, 6 (2018), 1731--1742.Google ScholarCross Ref
- Ittipong Chaisayun et al. 2012. Versatile analog squarer and multiplier free from body effect. Analog Integrated Circuits and Signal Processing 71, 3 (2012), 539--547.Google ScholarDigital Library
- Chixiao Chen, H.w. Ding, et al. 2017. OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators. In ESSCIRC 2017-43rd IEEE European Solid State Circuits Conference. IEEE, 259--262.Google ScholarCross Ref
- Guoguo Chen, C. Parada, and T.N. Sainath. 2015. Query-by-example keyword spotting using long short-term memory networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5236--5240.Google Scholar
- Juan SP Giraldo and Marian Verhelst. 2018. Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65nm CMOS. In ESSCIRC 2018-IEEE 44th European Solid State Circuits Conference (ESSCIRC). IEEE, 166--169.Google ScholarCross Ref
- Kaige Jia et al. 2018. Calibrating process variation at system level with in-situ low-precision transfer learning for analog neural network processors. In Proceedings of the 55th Annual Design Automation Conference. ACM, 12.Google Scholar
- Kyunghee Kang and Tadashi Shibata. 2010. An on-chip-trainable Gaussian-kernel analog support vector machine. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 7 (2010), 1513--1524.Google ScholarDigital Library
- Liangzhen Lai and Naveen Suda. 2018. Enabling deep learning at the IoT edge. In Proceedings of the International Conference on Computer-Aided Design. ACM, 135.Google ScholarDigital Library
- Qin Li et al. 2020. MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications. IEEE Access 8 (2020), 48720--48730.Google ScholarCross Ref
- Sheng Lin et al. 2019. Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM. arXiv preprint arXiv:1905.00789 (2019).Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, et al. 2017. Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop.Google Scholar
- M. Price et al. 2018. A low-power speech recognizer and voice activity detector using deep neural networks. IEEE Journal of Solid-State Circuits 53.1 (2018).Google ScholarCross Ref
- M. Shah et al. 2015. A fixed-point neural network for keyword detection on resource constrained hardware. In Workshop on Signal Processing Systems. IEEE.Google ScholarCross Ref
- Weiwei Shan et al. 2020. 14.1 A 510nW 0.41 V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depth-wise Separable Convolutional Neural Network in 28nm CMOS. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 230--232.Google Scholar
- Raphael T., W.j. W., Z.c. Tu, and J. L. 2018. An experimental analysis of the power consumption of convolutional neural networks for keyword spotting. In International Conference Acoustics, Speech and Signal Processing (ICASSP). IEEE.Google Scholar
- Naveen Verma, Hongyang Jia, Hossein Valavi, Yinqi Tang, Murat Ozatay, LungYen Chen, Bonan Zhang, and Peter Deaville. 2019. In-Memory Computing: Advances and prospects. IEEE Solid-State Circuits Magazine 11, 3 (2019), 43--55.Google ScholarCross Ref
- Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).Google Scholar
- F.x. Yu, Z.r. Xu, C.c. Liu, and X. Chen. 2019. MASKER: Adaptive Mobile Security Enhancement against Automatic Speech Recognition in Eavesdropping. In Proceedings of the 56th Annual Design Automation Conference 2019. ACM, 163.Google Scholar
- Y.d. Zhang, Naveen Suda, L.z. Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).Google Scholar
- An Zou et al. 2018. Efficient and reliable power delivery in voltage-stacked manycore system with. regulators. In 55th Design Automation Conference. IEEE.Google Scholar
Index Terms
- NS-KWS: joint optimization of near-sensor processing architecture and low-precision GRU for always-on keyword spotting
Recommendations
A Keyword-Aware Language Modeling Approach to Spoken Keyword Search
A keyword-sensitive language modeling framework for spoken keyword search (KWS) is proposed to combine the advantages of conventional keyword-filler based and large vocabulary continuous speech recognition (LVCSR) based KWS systems. The proposed ...
Spoken keyword search system using improved ASR engine and novel template-based keyword scoring
Keyword search for spoken documents has become more and more important nowadays due to the increasing amount of spoken data. The typical system makes use of an Automatic Speech Recognition system (ASR) and information retrieval methods. While a number ...
Addressing Effects of Formant Dispersion and Pitch Sensitivity for the Development of Children’s KWS System
Speech and ComputerAbstractThe accuracy of an automatic keyword spotting (KWS) system is observed to reduce in presence of mismatches such as pitch, speaking rate, formant dispersion, and background noise. To address these mismatches to some extent, this paper proposes a ...
Comments