Skip to main content

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Abstract

We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the first stage, we adapt a recently proposed quantization technique using a non-linear transformation with \(\tanh (.)\) on dense layer weights. In the second stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 h of de-identified production, far-field and near-field audio data (evaluating on 4,000 h of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model’s operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.

L. Zeng and S. H. K. Parthasarathi—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    An instruction set that can efficiently carry out matrix-vector multiplications.

  2. 2.

    Models have to run with low latency – i.e., cannot use large buffers.

  3. 3.

    Since the models are running continuously, they cannot get into a “bad” state.

  4. 4.

    We use CPU cycles as a proxy for power consumption.

  5. 5.

    https://www.counterpointresearch.com/global-smartphone-ap-market-share/.

  6. 6.

    https://www.arm.com/blogs/blueprint/android-64bit-future-mobile.

  7. 7.

    https://datasheets.maximintegrated.com/en/ds/MAX78000.pdf.

  8. 8.

    https://www.syntiant.com/post/syntiant-introduces-second-generation-ndp120-deep-learning-processor-for-audio-and-sensor-apps.

References

  1. Agarap, A.F.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2018)

  2. Banner, R., Hubara, I., Hoffer, E., Soudry, D.: Scalable methods for 8-bit training of neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  3. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  4. Blouw, P., Malik, G., Morcos, B., Voelker, A.R., Eliasmith, C.: Hardware aware training for efficient keyword spotting on general purpose and specialized hardware. arXiv preprint arXiv:2009.04465 (2020)

  5. Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: Proceedings of ICASSP (2014)

    Google Scholar 

  6. Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)

    Google Scholar 

  7. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)

    Article  Google Scholar 

  8. Gao, Y., et al.: On front-end gain invariant modeling for wake word spotting. arXiv preprint arXiv:2010.06676 (2020)

  9. Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUS). arXiv preprint arXiv:1606.08415 (2016)

  10. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  11. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)

  12. Jose, C., Mishchenko, Y., Senechal, T., Shah, A., Escott, A., Vitaladevuni, S.: Accurate detection of wake word start and end using a CNN. In: InterSpeech (2020)

    Google Scholar 

  13. Li, X., Wei, X., Qin, X.: Small-footprint keyword spotting with multi-scale temporal convolution. arXiv preprint arXiv:2010.09960 (2020)

  14. Mishchenko, Y., et al.: Low-bit quantization and quantization-aware training for small-footprint keyword spotting. In: Proceedings of IEEE International Conference On Machine Learning and Applications (ICMLA) (2019)

    Google Scholar 

  15. Mittermaier, S., Kürzinger, L., Waschneck, B., Rigoll, G.: Small-footprint keyword spotting on raw audio data with SINC-convolutions. In: Proceedings of ICASSP (2020)

    Google Scholar 

  16. Nguyen, H.D., Alexandridis, A., Mouchtaris, A.: Quantization aware training with absolute-cosine regularization for automatic speech recognition. In: Proceedings of InterSpeech (2020)

    Google Scholar 

  17. Panchapagesan, S., et al.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Proceedings of InterSpeech (2016)

    Google Scholar 

  18. Prabhavalkar, R., Alsharif, O., Bruguier, A., McGraw, L.: On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5970–5974. IEEE (2016)

    Google Scholar 

  19. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  20. Shi, B., Sun, M., Kao, C.C., Rozgic, V., Matsoukas, S., Wang, C.: Compression of acoustic event detection models with low-rank matrix factorization and quantization training. arXiv preprint arXiv:1905.00855 (2019)

  21. Strom, N., Khan, H., Hamza, W.: Squashed weight distribution for low bit quantization of deep models. In: Submitted to Proceedings of InterSpeech (2022)

    Google Scholar 

  22. Sun, M., et al.: Compressed time delay neural network for small-footprint keyword spotting. In: InterSpeech (2017)

    Google Scholar 

  23. Sun, M., et al.: Compressed time delay neural network for small-footprint keyword spotting. In: Proceedings of InterSpeech (2017)

    Google Scholar 

  24. Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: Proceedings of InterSpeech (2016)

    Google Scholar 

  25. Vandersteegen, M., Van Beeck, K., Goedemé, T.: Integer-only CNNs with 4 bit weights and bit-shift quantization scales at full-precision accuracy. Electronics (2021)

    Google Scholar 

  26. Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, L. et al. (2022). Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics