Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets

Zeng, Lu; Parthasarathi, Sree Hari Krishnan; Liu, Yuzong; Escott, Alex; Cheekatmalla, Santosh; Strom, Nikko; Vitaladevuni, Shiv

doi:10.1007/978-3-031-16270-1_30

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1100 Accesses

Abstract

We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the first stage, we adapt a recently proposed quantization technique using a non-linear transformation with $\tanh (.)$ on dense layer weights. In the second stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 h of de-identified production, far-field and near-field audio data (evaluating on 4,000 h of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model’s operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.

L. Zeng and S. H. K. Parthasarathi—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A depthwise separable convolutional neural network for keyword spotting on an embedded system

Article Open access 25 June 2020

Very Fast Keyword Spotting System with Real Time Factor Below 0.01

Evaluating and accelerating vision transformers on GPU-based embedded edge AI systems

Article Open access 27 December 2024

Notes

1.
An instruction set that can efficiently carry out matrix-vector multiplications.
2.
Models have to run with low latency – i.e., cannot use large buffers.
3.
Since the models are running continuously, they cannot get into a “bad” state.
4.
We use CPU cycles as a proxy for power consumption.
5.
https://www.counterpointresearch.com/global-smartphone-ap-market-share/.
6.
https://www.arm.com/blogs/blueprint/android-64bit-future-mobile.
7.
https://datasheets.maximintegrated.com/en/ds/MAX78000.pdf.
8.
https://www.syntiant.com/post/syntiant-introduces-second-generation-ndp120-deep-learning-processor-for-audio-and-sensor-apps.

References

Agarap, A.F.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2018)
Banner, R., Hubara, I., Hoffer, E., Soudry, D.: Scalable methods for 8-bit training of neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Blouw, P., Malik, G., Morcos, B., Voelker, A.R., Eliasmith, C.: Hardware aware training for efficient keyword spotting on general purpose and specialized hardware. arXiv preprint arXiv:2009.04465 (2020)
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: Proceedings of ICASSP (2014)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)
Google Scholar
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Article Google Scholar
Gao, Y., et al.: On front-end gain invariant modeling for wake word spotting. arXiv preprint arXiv:2010.06676 (2020)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUS). arXiv preprint arXiv:1606.08415 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Jose, C., Mishchenko, Y., Senechal, T., Shah, A., Escott, A., Vitaladevuni, S.: Accurate detection of wake word start and end using a CNN. In: InterSpeech (2020)
Google Scholar
Li, X., Wei, X., Qin, X.: Small-footprint keyword spotting with multi-scale temporal convolution. arXiv preprint arXiv:2010.09960 (2020)
Mishchenko, Y., et al.: Low-bit quantization and quantization-aware training for small-footprint keyword spotting. In: Proceedings of IEEE International Conference On Machine Learning and Applications (ICMLA) (2019)
Google Scholar
Mittermaier, S., Kürzinger, L., Waschneck, B., Rigoll, G.: Small-footprint keyword spotting on raw audio data with SINC-convolutions. In: Proceedings of ICASSP (2020)
Google Scholar
Nguyen, H.D., Alexandridis, A., Mouchtaris, A.: Quantization aware training with absolute-cosine regularization for automatic speech recognition. In: Proceedings of InterSpeech (2020)
Google Scholar
Panchapagesan, S., et al.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Proceedings of InterSpeech (2016)
Google Scholar
Prabhavalkar, R., Alsharif, O., Bruguier, A., McGraw, L.: On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5970–5974. IEEE (2016)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Shi, B., Sun, M., Kao, C.C., Rozgic, V., Matsoukas, S., Wang, C.: Compression of acoustic event detection models with low-rank matrix factorization and quantization training. arXiv preprint arXiv:1905.00855 (2019)
Strom, N., Khan, H., Hamza, W.: Squashed weight distribution for low bit quantization of deep models. In: Submitted to Proceedings of InterSpeech (2022)
Google Scholar
Sun, M., et al.: Compressed time delay neural network for small-footprint keyword spotting. In: InterSpeech (2017)
Google Scholar
Sun, M., et al.: Compressed time delay neural network for small-footprint keyword spotting. In: Proceedings of InterSpeech (2017)
Google Scholar
Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: Proceedings of InterSpeech (2016)
Google Scholar
Vandersteegen, M., Van Beeck, K., Goedemé, T.: Integer-only CNNs with 4 bit weights and bit-shift quantization scales at full-precision accuracy. Electronics (2021)
Google Scholar
Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Alexa, Amazon, Seattle, USA
Lu Zeng, Sree Hari Krishnan Parthasarathi, Yuzong Liu, Alex Escott, Santosh Cheekatmalla, Nikko Strom & Shiv Vitaladevuni

Authors

Lu Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Sree Hari Krishnan Parthasarathi
View author publications
You can also search for this author in PubMed Google Scholar
Yuzong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Alex Escott
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Cheekatmalla
View author publications
You can also search for this author in PubMed Google Scholar
Nikko Strom
View author publications
You can also search for this author in PubMed Google Scholar
Shiv Vitaladevuni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lu Zeng .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, L. et al. (2022). Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_30
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets