An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network

Samanta, Anu; Hatai, Indranil; Mal, Ashis Kumar

doi:10.1007/s11042-023-14699-1

An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network

Published: 23 February 2023

Volume 82, pages 27861–27882, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Anu Samanta¹,
Indranil Hatai² &
Ashis Kumar Mal³

187 Accesses
Explore all metrics

Abstract

This research paper proposed deep neural networks and approximation computation are used to create an energy-efficient voice activity detector (VDA). The proposed technique is split up into two parts: feature extraction and voice/noise classification using a deep neural network with Gaussian basis normalization (GNDNN). Pre-processing of input data initially: the digitalized speech signal’s high-frequency components are pre-emphasized, trying to make it a little less susceptible to finite precision impacts later inside the signal processing. The feature extraction module uses Mel-frequency cepstral coefficients (MFCC), time-frequency non-negative matrix factorization (TFNMF), to extract the input speech signals feature value. The TFNMF, MFCC output from feature extraction is classified by the GNDNN speech prediction phase, which evaluates whether the signal is indeed a voice or noise. The proposed approach can be dynamically changed to meet various computing accuracy demands. Our proposed approach most exciting accuracy result of 98.75%. Comparable to the CNN and DNN, which achieves the accuracy of 97.25%, 95.25%, and EERA had the worst accuracy 88.75%. The results of the experiments show that our proposed strategy outperforms previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancement of speech dynamics for voice activity detection using DNN

Article Open access 12 September 2018

Noise robust voice activity detection using joint phase and magnitude based feature enhancement

Article 11 April 2017

Robust Noisy Speech Parameterization Using Convolutional Neural Networks

Data availability

Datasets for this research are included in [26].

References

Albinsaid H, Singh K, Biswas S, Li C-P, Alouini M-S (2020) Block deep neural network-based signal detector for generalized spatial modulation. IEEE Commun Lett 24(12):2775–2779
Article Google Scholar
Anderson R, Sandsten M (2020) Time-frequency feature extraction for classification of episodic memory. EURASIP J Adv Sig Proc 2020(1):1–18
Google Scholar
Braun S, Tashev I (2021) "On training targets for noise-robust voice activity detection", In 2021 29th European Signal Processing Conference (EUSIPCO), pp. 421–425. IEEE
Chen Y, Yang T-J, Emer J, Sze V (2018) Understanding the limitations of existing energy-efficient design approaches for deep neural networks. Energy 2(L1):L3
Google Scholar
Dellaferrera G, Martinelli F, Cernak M (2020) "A bin encoding training of a spiking neural network based voice activity detection". In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3207–3211. IEEE
Fan Z-C, Bai Z, Zhang X-L, Rahardja S, Chen J (2019) "AUC optimization for deep learning based voice activity detection." In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6760–6764. IEEE
Furui S (1981) Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans Acoust Speech Sig Proc 29:342–350
Article Google Scholar
Jacob AJ, Jacob AA, Mathew A (2021) End-to-End Speech Emotion Recognition Using Deep Learning. Int J Res Engin, Sci Manage 4(3):134–135
Google Scholar
Kim CH, Lee JM, Kang SH, Kim SY, Im DS, Yoo HJ (2020) "1b-16b variable bit precision dnn processor for emotional hri system in mobile devices." J Integ Circ Syst 6, no. 3
Korkmaz Y, Boyaci A (2022) milVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learning. Biomed Sign Proc Contr 74:103520
Article Google Scholar
Koteswararao YV, Rao CR (2021 Apr) Multichannel speech separation using hybrid GOMF and enthalpy-based deep neural networks. Multimedia Systems 27(2):271–286
Lee S (2020 Jan) Estimating the rank of a nonnegative matrix factorization model for automatic music transcription based on stein’s unbiased risk estimator. Appl Sci 10(8):2911
Lee TY, Levorato M, Dutt N (2019) "DNN-Assisted Sensor for Energy-Efficient ECG Monitoring." In 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE
Liu B, Qin H, Yu G, Ge W, Xia M, Shi L (2018) EERA-ASR: an energy-efficient reconfigurable architecture for automatic speech recognition with hybrid DNN and approximate computing. IEEE Access 6:52227–52237
Article Google Scholar
Liu B, Qin H, Yu G, Ge W, Xia M, Shi L (2018) EERA-ASR: an energy-efficient reconfigurable architecture for automatic speech recognition with hybrid DNN and approximate computing. IEEE Access 6:52227–52237
Article Google Scholar
Liu B, Wang Z, Guo S, Yu H, Yu G, Yang J, Shi L (2019) An energy-efficient voice activity detector using deep neural networks and approximate computing. Microelectron J 87:12–21
Article Google Scholar
Liu W, Liao Q, Qiao F, Xia W, Wang C, Lombardi F (2019) Approximate designs for fast Fourier transform (FFT) with application to speech recognition. IEEE Transac Circuits Syst I: Reg Papers 66(12):4727–4739
Article Google Scholar
Luckenbaugh J, Abplanalp S, Gonzalez R, Fulford D, Gard D, Busso C (2021) Voice activity detection with teacher-student domain emulation. Proc Interspeech 2021:4374–4378
Google Scholar
Martinelli F, Dellaferrera G, Mainar P, Cernak M (2020) "Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection". In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8544–8548. IEEE
Mason JS, Zhang X (1991) Velocity and acceleration features in speaker recognition, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). pp. 3673–3676
Mihalache S, Burileanu D (2022) Using voice activity detection and deep neural networks with hybrid speech feature extraction for deceptive speech detection. Sensors 22(3):1228
Article Google Scholar
Oh S, Cho M, Shi Z, Lim J, Kim Y, Jeong S, Chen Y et al (2019) An acoustic signal processing chip with 142-nW voice activity detection using mixer-based sequential frequency scanning and neural network classification. IEEE J Solid State Circuits 54(11):3005–3016
Article Google Scholar
Oh YR, Park K, Park JG (2020) Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)-Based Voice-Activity Detector. Appl Sci 10(12):4091
Article Google Scholar
Ovaska M, Kultanen J, Autto T, Uusnäkki J, Kariluoto A, Himmanen J, Virtaneva M, Kaitila P, Abrahamsson P (2021) "Deep Neural Network Voice Activity Detector for Downsampled Audio Data: An Experiment Report". arXiv preprint arXiv:2108.05553
Price M, Glass J, Chandrakasan AP (2017) A low-power speech recognizer and voice activity detector using deep neural networks. IEEE J Solid State Circuits 53(1):66–75
Article Google Scholar
Rabiner L (2010) Fundamentals of Speech Recognition Course. Accessed: Dec. 2010. [Online].Available:https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/speech%20recognition%20course.html
Rabiner L, Juang B-H, Yegnanarayana B (2008) Fundamentals of speech recognition. Pearson Education, London
Google Scholar
Rios-Navarro A, Gutierrez-Galan D, Dominguez-Morales JP, Piñero-Fuentes E, Duran-Lopez L, Tapiador-Morales R, Dominguez-Morales MJ (2021) Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC. Electronics 10(1):94
Article Google Scholar
Savran A, Tavarone R, Higy B, Badino L, Bartolozzi C (2018) "Energy and computation efficient audio-visual voice activity detection driven by event-cameras." In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 333–340. IEEE
Smit P, Virpioja S, Kurimo M (2021) Advances in subword-based HMM-DNN speech recognition across languages. Comput Speech Lang 66:101158
Article Google Scholar
Sterneck R, Moitra A, Panda P (2021) "Noise Sensitivity-Based Energy Efficient and Robust Adversary Detection in Neural Networks." arXiv preprint arXiv:2101.01543
Teng P, Jia Y (2013 Mar 14) Voice activity detection via noise reducing using non-negative sparse coding. IEEE Signal Proc Lett 20(5):475–478
Wilkinson N, Niesler T (2021) "A hybrid CNN-BiLSTM voice activity detector." In ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6803–6807 IEEE
Yin S, Ouyang P, Yang J, Lu T, Li X, Liu L, Wei S (2018) "An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28nm CMOS." In 2018 IEEE Symposium on VLSI Circuits, pp. 37–38. IEEE
Yin S, Tang S, Lin X, Ouyang P, Fengbin T, Liu L, Wei S (2018) A high throughput acceleration for hybrid neural networks with efficient resource management on FPGA. IEEE Transac Comput-Aided Des Integra Circuits Syst 38(4):678–691
Article Google Scholar
Yoshimura T, Hayashi T, Takeda K, Watanabe S (2020) "End-to-end automatic speech recognition integrated with CTC-based voice activity detection". In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6999–7003. IEEE
Yu H, Zhu W-P, Champagne B (2020) Speech enhancement using a DNN-augmented colored-noise Kalman filter. Speech Comm 125:142–151
Article Google Scholar
Zhang J, Rangineni K, Ghodsi Z, Garg S (2018) "Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators." In Proceedings of the 55th Annual Design Automation Conference, pp. 1–6
Zheng Z, Wang J, Cheng N, Luo J, Xiao J (2020) "Mlnet: An adaptive multiple receptive-field attention neural network for voice activity detection". arXiv preprint arXiv: 2008.05650

Download references

Funding

In this research article has not been funded by anyone.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Brainware University, Barasat, Kolkata, West Bengal, India
Anu Samanta
Mathworks India Private Limited, Bangalore, India
Indranil Hatai
Department of Electronics and Communication Engineering, NIT Durgapur, Durgapur, West Bengal, India
Ashis Kumar Mal

Authors

Anu Samanta
View author publications
You can also search for this author in PubMed Google Scholar
Indranil Hatai
View author publications
You can also search for this author in PubMed Google Scholar
Ashis Kumar Mal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anu Samanta.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

All authors do not have any conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Samanta, A., Hatai, I. & Mal, A.K. An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network. Multimed Tools Appl 82, 27861–27882 (2023). https://doi.org/10.1007/s11042-023-14699-1

Download citation

Received: 25 August 2021
Revised: 15 July 2022
Accepted: 04 February 2023
Published: 23 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14699-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network

Abstract

Access this article

Similar content being viewed by others

Enhancement of speech dynamics for voice activity detection using DNN

Noise robust voice activity detection using joint phase and magnitude based feature enhancement

Robust Noisy Speech Parameterization Using Convolutional Neural Networks

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network

Abstract

Access this article

Similar content being viewed by others

Enhancement of speech dynamics for voice activity detection using DNN

Noise robust voice activity detection using joint phase and magnitude based feature enhancement

Robust Noisy Speech Parameterization Using Convolutional Neural Networks

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation