Speech enhancement using long short term memory with trained speech features and adaptive wiener filter

Garg, Anil

doi:10.1007/s11042-022-13302-3

Speech enhancement using long short term memory with trained speech features and adaptive wiener filter

Published: 14 July 2022

Volume 82, pages 3647–3675, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Anil Garg¹

2337 Accesses
15 Citations
1 Altmetric
Explore all metrics

Abstract

Speech enhancement is the process of enhancing the clarity and intelligibility of speech signals that have been degraded due to background noise. With the assistance of deep learning, a novel speech signal enhancement model is introduced in this research. The proposed model is divided into two phases: (i) Training (ii) Testing. In the training phase, the noise spectrum and signal spectrum are estimated via a Non-negative Matrix Factorization (NMF) from the noisy input signal. Then, Empirical Mean Decomposition (EMD) features are extracted from the Wiener filter. The de-noised signal is acquired from EMD, the bark frequency is evaluated and the Fractional Delta AMS features are extracted. The key contribution of this study is the use of the Long Short Term Memory (LSTM) model to properly estimate the tuning factor η of the Wiener filter for all input signals. The LSTM was trained by the extracted features (EMD) via a modified wiener filter for decomposing the spectral signal and the output of EMD is the denoised enhanced speech signal. A comparative evaluation is carried out between the proposed and existing models in terms of error measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734
Anita JS, Abinaya JS (2019) Impact of supervised classifier on speech emotion recognition. Multimedia Res 2(1):9–16
Google Scholar
Arul VH, Sivakumar VG, Marimuthu R, Chakraborty B (2019) An approach for speech enhancement using deep convolutional neural network. Multimedia Res 2(1):37–44
Google Scholar
NOIZEUS: https://ecs.utdallas.edu/loizou/speech/noizeus/ (Access Date: 2021-05-06)
Bekë K, Elezaj E, Millaku B, Dreshaj A, Hung NT (2021) The impact of COVID-19 (SARS-CoV-2) in tourism industry: evidence of Kosovo during Q1, Q2 and Q3 period of 2020. J Sustain Financ Invest:1–12
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Proc 81(11):2403–2418
Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Sig Proc 27(2):113–120
Chai L, Du J, Liu Q-F, Lee C-H (2021) A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 29:106–117
Article Google Scholar
Chung H, Plourde E, Champagne B (2017) Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Comm 87:18–30
Article Google Scholar
Cuiv X, Chen Z, Yin F (2021) Multi-objective based multi-channel speech enhancement with BiLSTM network. Appl Acoust
Daniel M, Tan Z-H, Zhang S-X, Xu Y, Yu M, Yu D, Jensen J (2021) An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Trans Audio Speech Lang Process
Darekar RV, Dhande AP (2019) Emotion recognition from speech signals using DCNN with hybrid GA-GWO algorithm. Multimedia Res 2(4):12–22
Google Scholar
Dionelis N, Brookes M (2018) Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering. IEEE/ACM trans Audio Speech Lang Process 26(5):937–950
Article Google Scholar
Garg A (2020) Enhancement of speech signal using diminished empirical mean curve decomposition-based adaptive wiener filtering. in comm.
Garg A (2020) Deep convolutional neural network based speech signal enhancement using extensive speech features. in comm.
Gelderblom FB, Tronstad TV, Viggen EM (2019) Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement. IEEE/ACM trans Audio Speech Lang Process 27(3):583–594
Article Google Scholar
Hongjiang Y, Ouyang Z, Zhu WP, Champagne B, Ji Y (2019) A deep neural network based Kalman filter for time domain speech enhancement. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–5
Ishaan G, et al (2017) “Improved training of wasserstein gans.” Advances in neural information processing systems vol 30
Kolbæk M, Tan Z, Jensen J (2019) On the relationship between short-time objective intelligibility and short-time spectral-amplitude mean-square error for speech enhancement. IEEE/ACM Trans Audio speech Lang Process 27(2):283–295
Article Google Scholar
Lavanya T, Nagarajan T, Vijayalakshmi P (2020) Multi-level Single-Channel speech enhancement using a unified framework for estimating magnitude and phase spectra. IEEE/ACM Trans Audio Speech Lang Process 28:1315–1327
Article Google Scholar
Nicolson A, Paliwal KK (2018) Bidirectional long-short term memory network-based estimation of reliable spectral component locations. In: INTERSPEECH 1606-1610.
Pfeifenberger L, Zöhrer M, Pernkopf F (2019) Eigenvector-based speech mask estimation for Multi-Channel speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 27(12):2162–2172
Article Google Scholar
Phillip I, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conf on comp vision and pattern recog 1125–1134
Plapous C, Marro C, Mauuary L, Scalart P (2004) A two-step noise reduction technique. 2004 IEEE Int Conf Acoust, Speech, and Signal Process 1:289–292
Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans ASLP 14:2098–2108
Google Scholar
Reddy BG, Ofori M, Liu J, Ambati LS (2020) Early public outlook on the coronavirus disease (COVID-19): a soc med study
Sadeghi M, Leglaive S, Alameda-Pineda X, Girin L, Horaud R (2020) Audio-visual speech enhancement using conditional variational auto-encoders. IEEE/ACM Trans Audio Speech Lang Process 28:1788–1800
Article Google Scholar
Saleem N, Khattak MI, Al-Hasan M, Qazi AB (2020) On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks. IEEE Access 8:160581–160595
Article Google Scholar
Saleem N, Khattak MI, Ochani MK (2021) Perceptually weighted β-order spectral amplitude Bayesian estimator for phase compensated speech enhancement. Appl Acoust 178:108007
Article Google Scholar
Santiago P, Bonafonte A, Serra J (2017) SEGAN: Speech enhancement generative adversarial network arXiv preprint arXiv: 1703.09452
Sepp H, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Shu X, Zhou Y, Liu H, Truong TK (2020) A human auditory perception loss function using modified bark spectral distortion for speech enhancement. Neural Process Lett 51(3):2945–2957
Article Google Scholar
Sun X, Gao Z-F, Lu Z-Y, Li J, Yan Y (2020) A model compression method with matrix product operators for speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 28:2837–2847
Article Google Scholar
Tayseer M, Adeel A, Hussain A (2018) A survey on techniques for enhancing speech. Int J Comput Appl 179(17):1–14
Google Scholar
Triantafyllos A, Chung JS, Zisserman A (2018) The conversation: deep audio-visual speech enhancement. arXiv preprint arXiv:1804.04121
Venkateswarlu S, China K, Prasad S, Reddy AS (2011) Improve Speech Enhancement Using Weiner Filtering. Global J Comput Sci Technol
Wang Z, Zhang T, Ding B (2020) LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement. Appl Acoust 172:107647
Article Google Scholar
Wood SUN, Stahl JKW, Mowlaee P (2019) Binaural codebook-based speech enhancement with atomic speech presence probability. IEEE/ACM Trans Audio Speech Lang Process 27(12):2150–2161
Article Google Scholar
Xiang Y, Bao C (2020) A parallel-data-free speech enhancement method using multi-objective learning cycle-consistent generative adversarial network. IEEE/ACM trans Audio speech Lang Process 28:1826–1838
Article Google Scholar
Xu L, Wei Z, Zaidi SFA, Ren B, Yang J (2021) Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain. Appl Acoust 174:107732
Article Google Scholar
Yong X, Du J, Dai L-R, Lee C-H (2013) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):65–68
Google Scholar
Yong KH, Yoon JW, Cheon SJ, Kang WH, Kim NS (2021) A multi-resolution approach to GAN-based speech enhancement. Appl Sci 11(2):721
Article Google Scholar
Yu H, Zhu W-P, Champagne B (2020) Speech enhancement using a DNN-augmented colored-noise Kalman filter. Speech Comm 125:142–151
Article Google Scholar
Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858
Zhu Y, Xu X, Ye Z (2020) FLGCNN: a novel fully convolutional neural network for end-to-end monaural speech enhancement with utterance-based objective functions. Appl Acoust 170:107511
Article Google Scholar
Zou X, Zhang X (2007) Speech enhancement using an MMSE short time DCT coefficients estimator with supergaussian speech modeling. J Electron (China) 24(3):332–337

Download references

Author information

Authors and Affiliations

ECE Department, Maharishi Markandeshwar Engineering College, Maharishi Markandeshwar Deemed To Be University, Mullana, Ambala, Haryana, 134007, India
Anil Garg

Authors

Anil Garg
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Anil Garg.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garg, A. Speech enhancement using long short term memory with trained speech features and adaptive wiener filter. Multimed Tools Appl 82, 3647–3675 (2023). https://doi.org/10.1007/s11042-022-13302-3

Download citation

Received: 23 June 2021
Revised: 28 January 2022
Accepted: 30 May 2022
Published: 14 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13302-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech enhancement using long short term memory with trained speech features and adaptive wiener filter

Abstract

Access this article

Subscribe and save

Buy Now

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now