Abstract
Speech enhancement is the process of enhancing the clarity and intelligibility of speech signals that have been degraded due to background noise. With the assistance of deep learning, a novel speech signal enhancement model is introduced in this research. The proposed model is divided into two phases: (i) Training (ii) Testing. In the training phase, the noise spectrum and signal spectrum are estimated via a Non-negative Matrix Factorization (NMF) from the noisy input signal. Then, Empirical Mean Decomposition (EMD) features are extracted from the Wiener filter. The de-noised signal is acquired from EMD, the bark frequency is evaluated and the Fractional Delta AMS features are extracted. The key contribution of this study is the use of the Long Short Term Memory (LSTM) model to properly estimate the tuning factor η of the Wiener filter for all input signals. The LSTM was trained by the extracted features (EMD) via a modified wiener filter for decomposing the spectral signal and the output of EMD is the denoised enhanced speech signal. A comparative evaluation is carried out between the proposed and existing models in terms of error measures.



References
Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734
Anita JS, Abinaya JS (2019) Impact of supervised classifier on speech emotion recognition. Multimedia Res 2(1):9–16
Arul VH, Sivakumar VG, Marimuthu R, Chakraborty B (2019) An approach for speech enhancement using deep convolutional neural network. Multimedia Res 2(1):37–44
NOIZEUS: https://ecs.utdallas.edu/loizou/speech/noizeus/ (Access Date: 2021-05-06)
Bekë K, Elezaj E, Millaku B, Dreshaj A, Hung NT (2021) The impact of COVID-19 (SARS-CoV-2) in tourism industry: evidence of Kosovo during Q1, Q2 and Q3 period of 2020. J Sustain Financ Invest:1–12
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Proc 81(11):2403–2418
Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Sig Proc 27(2):113–120
Chai L, Du J, Liu Q-F, Lee C-H (2021) A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 29:106–117
Chung H, Plourde E, Champagne B (2017) Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Comm 87:18–30
Cuiv X, Chen Z, Yin F (2021) Multi-objective based multi-channel speech enhancement with BiLSTM network. Appl Acoust
Daniel M, Tan Z-H, Zhang S-X, Xu Y, Yu M, Yu D, Jensen J (2021) An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Trans Audio Speech Lang Process
Darekar RV, Dhande AP (2019) Emotion recognition from speech signals using DCNN with hybrid GA-GWO algorithm. Multimedia Res 2(4):12–22
Dionelis N, Brookes M (2018) Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering. IEEE/ACM trans Audio Speech Lang Process 26(5):937–950
Garg A (2020) Enhancement of speech signal using diminished empirical mean curve decomposition-based adaptive wiener filtering. in comm.
Garg A (2020) Deep convolutional neural network based speech signal enhancement using extensive speech features. in comm.
Gelderblom FB, Tronstad TV, Viggen EM (2019) Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement. IEEE/ACM trans Audio Speech Lang Process 27(3):583–594
Hongjiang Y, Ouyang Z, Zhu WP, Champagne B, Ji Y (2019) A deep neural network based Kalman filter for time domain speech enhancement. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–5
Ishaan G, et al (2017) “Improved training of wasserstein gans.” Advances in neural information processing systems vol 30
Kolbæk M, Tan Z, Jensen J (2019) On the relationship between short-time objective intelligibility and short-time spectral-amplitude mean-square error for speech enhancement. IEEE/ACM Trans Audio speech Lang Process 27(2):283–295
Lavanya T, Nagarajan T, Vijayalakshmi P (2020) Multi-level Single-Channel speech enhancement using a unified framework for estimating magnitude and phase spectra. IEEE/ACM Trans Audio Speech Lang Process 28:1315–1327
Nicolson A, Paliwal KK (2018) Bidirectional long-short term memory network-based estimation of reliable spectral component locations. In: INTERSPEECH 1606-1610.
Pfeifenberger L, Zöhrer M, Pernkopf F (2019) Eigenvector-based speech mask estimation for Multi-Channel speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 27(12):2162–2172
Phillip I, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conf on comp vision and pattern recog 1125–1134
Plapous C, Marro C, Mauuary L, Scalart P (2004) A two-step noise reduction technique. 2004 IEEE Int Conf Acoust, Speech, and Signal Process 1:289–292
Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans ASLP 14:2098–2108
Reddy BG, Ofori M, Liu J, Ambati LS (2020) Early public outlook on the coronavirus disease (COVID-19): a soc med study
Sadeghi M, Leglaive S, Alameda-Pineda X, Girin L, Horaud R (2020) Audio-visual speech enhancement using conditional variational auto-encoders. IEEE/ACM Trans Audio Speech Lang Process 28:1788–1800
Saleem N, Khattak MI, Al-Hasan M, Qazi AB (2020) On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks. IEEE Access 8:160581–160595
Saleem N, Khattak MI, Ochani MK (2021) Perceptually weighted β-order spectral amplitude Bayesian estimator for phase compensated speech enhancement. Appl Acoust 178:108007
Santiago P, Bonafonte A, Serra J (2017) SEGAN: Speech enhancement generative adversarial network arXiv preprint arXiv: 1703.09452
Sepp H, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Shu X, Zhou Y, Liu H, Truong TK (2020) A human auditory perception loss function using modified bark spectral distortion for speech enhancement. Neural Process Lett 51(3):2945–2957
Sun X, Gao Z-F, Lu Z-Y, Li J, Yan Y (2020) A model compression method with matrix product operators for speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 28:2837–2847
Tayseer M, Adeel A, Hussain A (2018) A survey on techniques for enhancing speech. Int J Comput Appl 179(17):1–14
Triantafyllos A, Chung JS, Zisserman A (2018) The conversation: deep audio-visual speech enhancement. arXiv preprint arXiv:1804.04121
Venkateswarlu S, China K, Prasad S, Reddy AS (2011) Improve Speech Enhancement Using Weiner Filtering. Global J Comput Sci Technol
Wang Z, Zhang T, Ding B (2020) LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement. Appl Acoust 172:107647
Wood SUN, Stahl JKW, Mowlaee P (2019) Binaural codebook-based speech enhancement with atomic speech presence probability. IEEE/ACM Trans Audio Speech Lang Process 27(12):2150–2161
Xiang Y, Bao C (2020) A parallel-data-free speech enhancement method using multi-objective learning cycle-consistent generative adversarial network. IEEE/ACM trans Audio speech Lang Process 28:1826–1838
Xu L, Wei Z, Zaidi SFA, Ren B, Yang J (2021) Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain. Appl Acoust 174:107732
Yong X, Du J, Dai L-R, Lee C-H (2013) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):65–68
Yong KH, Yoon JW, Cheon SJ, Kang WH, Kim NS (2021) A multi-resolution approach to GAN-based speech enhancement. Appl Sci 11(2):721
Yu H, Zhu W-P, Champagne B (2020) Speech enhancement using a DNN-augmented colored-noise Kalman filter. Speech Comm 125:142–151
Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858
Zhu Y, Xu X, Ye Z (2020) FLGCNN: a novel fully convolutional neural network for end-to-end monaural speech enhancement with utterance-based objective functions. Appl Acoust 170:107511
Zou X, Zhang X (2007) Speech enhancement using an MMSE short time DCT coefficients estimator with supergaussian speech modeling. J Electron (China) 24(3):332–337
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Garg, A. Speech enhancement using long short term memory with trained speech features and adaptive wiener filter. Multimed Tools Appl 82, 3647–3675 (2023). https://doi.org/10.1007/s11042-022-13302-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13302-3