Accelerating RNN-Based Speech Enhancement on a Multi-core MCU with Mixed FP16-INT8 Post-training Quantization

Rusci, Manuele; Fariselli, Marco; Croome, Martin; Paci, Francesco; Flamand, Eric

doi:10.1007/978-3-031-23618-1_41

Manuele Rusci^46,47,
Marco Fariselli⁴⁷,
Martin Croome⁴⁷,
Francesco Paci⁴⁷ &
…
Eric Flamand⁴⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1752))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1196 Accesses

Abstract

This paper presents an optimized methodology to design and deploy Speech Enhancement (SE) algorithms based on Recurrent Neural Networks (RNNs) on a state-of-the-art MicroController Unit (MCU), with 1+8 general-purpose RISC-V cores. To achieve low-latency execution, we propose an optimized software pipeline interleaving parallel computation of LSTM or GRU recurrent blocks, featuring vectorized 8-bit integer (INT8) and 16-bit floating-point (FP16) compute units, with manually-managed memory transfers of model parameters. To ensure minimal accuracy degradation with respect to the full-precision models, we propose a novel FP16-INT8 Mixed-Precision Post-Training Quantization (PTQ) scheme that compresses the recurrent layers to 8-bit while the bit precision of remaining layers is kept to FP16. Experiments are conducted on multiple LSTM and GRU based SE models trained on the Valentini dataset, featuring up to 1.24M parameters. Thanks to the proposed approaches, we speed-up the computation by up to 4$\times $ with respect to the lossless FP16 baselines. Differently from a uniform 8-bit quantization that degrades the PESQ score by 0.3 on average, the Mixed-Precision PTQ scheme leads to a low-degradation of only 0.06, while achieving a 1.4–1.7$\times $ memory saving. Thanks to this compression, we cut the power cost of the external memory by fitting the large models on the limited on-chip non-volatile memory and we gain a MCU power saving of up to 2.5$\times $ by reducing the supply voltage from 0.8 V to 0.65 V while still matching the real-time constraints. Our design results >10$\times $ more energy efficient than state-of-the-art SE solutions deployed on single-core MCUs that make use of smaller models and quantization-aware training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR

Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning

Article 02 October 2017

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Article Open access 21 July 2021

Notes

1.
https://greenwaves-technologies.com/tools-and-software/.

References

Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Sig. Process. 27(2), 113–120 (1979)
Article Google Scholar
Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Sig. Process. 81(11), 2403–2418 (2001)
Article MATH Google Scholar
Defossez, A., Synnaeve, G., Adi, Y.: Real time speech enhancement in the waveform domain. In: Interspeech (2020)
Google Scholar
Fedorov, I., et al.: TinyLSTMs: efficient neural speech enhancement for hearing aids (2020)
Google Scholar
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021)
Hu, Y., et al.: DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement (2020)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Liu, J., Zhang, X.: DRC-NET: densely connected recurrent convolutional neural network for speech dereverberation. In: International Conference on Acoustics, Speech and Signal Processing, pp. 166–170. IEEE (2022)
Google Scholar
Ma, J.: A higher-level Neural Network library on Microcontrollers (NNoM), October 2020. https://doi.org/10.5281/zenodo.4158710
Reddy, C.K., Beyrami, E., Pool, J., Cutler, R., Srinivasan, S., Gehrke, J.: A scalable noisy speech dataset and online subjective test framework. In: Proceedings of Interspeech 2019, pp. 1816–1820 (2019)
Google Scholar
Reddy, C.K., et al.: Interspeech 2021 deep noise suppression challenge. arXiv preprint arXiv:2101.01902 (2021)
Stamenovic, M., Westhausen, N.L., Yang, L.C., Jensen, C., Pawlicki, A.: Weight, block or unit? Exploring sparsity tradeoffs for speech enhancement on tiny neural accelerators. arXiv preprint arXiv:2111.02351 (2021)
Valentini-Botinhao, C., et al.: Noisy speech database for training speech enhancement algorithms and TTS models (2017)
Google Scholar
Valin, J.M.: A hybrid DSP/deep learning approach to real-time full-band speech enhancement. In: 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5. IEEE (2018)
Google Scholar
Valin, J.M., Isik, U., Phansalkar, N., Giri, R., Helwani, K., Krishnaswamy, A.: A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech (2020)
Google Scholar
Xia, Y., Braun, S., Reddy, C.K., Dubey, H., Cutler, R., Tashev, I.: Weighted speech distortion losses for neural-network-based real-time speech enhancement. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 871–875. IEEE (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Universita’ di Bologna, Bologna, Italy
Manuele Rusci
Greenwaves Technologies, Grenoble, France
Manuele Rusci, Marco Fariselli, Martin Croome, Francesco Paci & Eric Flamand

Authors

Manuele Rusci
View author publications
You can also search for this author in PubMed Google Scholar
Marco Fariselli
View author publications
You can also search for this author in PubMed Google Scholar
Martin Croome
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Paci
View author publications
You can also search for this author in PubMed Google Scholar
Eric Flamand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuele Rusci .

Editor information

Editors and Affiliations

University of Sydney, Sydney, Australia
Irena Koprinska
University of Bari Aldo Moro, Bari, Italy
Paolo Mignone
University of Pisa, Pisa, Italy
Riccardo Guidotti
Warsaw University of Technology, Warsaw, Poland
Szymon Jaroszewicz
Heidelberg University, Heidelberg, Germany
Holger Fröning
UniCredit, Rome, Italy
Francesco Gullo
University of Lisbon, Lisbon, Portugal
Pedro M. Ferreira
Roche, Basel, Switzerland
Damian Roqueiro
Barcelona Supercomputing Center, Barcelona, Spain
Gaia Ceddia
Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk
University of Porto, Porto, Portugal
João Gama
University of Porto, Porto, Portugal
Rita Ribeiro
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
University of Naples Federico II, Naples, Italy
Elio Masciari
University of North Carolina, Charlotte, USA
Zbigniew Ras
ICAR-CNR, Rende, Italy
Ettore Ritacco
University of Pisa, Pisa, Italy
Francesca Naretto
Aalen University of Applied Sciences, Aalen, Germany
Andreas Theissler
Warsaw University of Technology, Warszaw, Poland
Przemyslaw Biecek
KU Leuven, Leuven, Belgium
Wouter Verbeke
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
AMD, Dublin, Ireland
Michaela Blott
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Ivan Luciano Danesi
National Agency for New Technologies, Rome, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
University of Lisbon, Lisbon, Portugal
Guilherme Graça
Northwestern University, Chicago, USA
Lee Cooper
Roche, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
Novartis, Basel, Switzerland
Diego Saldana
Novartis, Basel, Switzerland
Konstantinos Sechidis
Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
Arif Canakoglu
Politecnico di Milano, Milan, Italy
Sara Pido
Politecnico di Milano, Milan, Italy
Pietro Pinoli
University of Waikato, Hamilton, New Zealand
Albert Bifet
Halmstad University, Halmstad, Sweden
Sepideh Pashami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rusci, M., Fariselli, M., Croome, M., Paci, F., Flamand, E. (2023). Accelerating RNN-Based Speech Enhancement on a Multi-core MCU with Mixed FP16-INT8 Post-training Quantization. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-23618-1_41
Published: 31 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23617-4
Online ISBN: 978-3-031-23618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accelerating RNN-Based Speech Enhancement on a Multi-core MCU with Mixed FP16-INT8 Post-training Quantization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR

Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Accelerating RNN-Based Speech Enhancement on a Multi-core MCU with Mixed FP16-INT8 Post-training Quantization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR

Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation