Skip to main content

Advertisement

Log in

Speech enhancement using long short term memory with trained speech features and adaptive wiener filter

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Speech enhancement is the process of enhancing the clarity and intelligibility of speech signals that have been degraded due to background noise. With the assistance of deep learning, a novel speech signal enhancement model is introduced in this research. The proposed model is divided into two phases: (i) Training (ii) Testing. In the training phase, the noise spectrum and signal spectrum are estimated via a Non-negative Matrix Factorization (NMF) from the noisy input signal. Then, Empirical Mean Decomposition (EMD) features are extracted from the Wiener filter. The de-noised signal is acquired from EMD, the bark frequency is evaluated and the Fractional Delta AMS features are extracted. The key contribution of this study is the use of the Long Short Term Memory (LSTM) model to properly estimate the tuning factor η of the Wiener filter for all input signals. The LSTM was trained by the extracted features (EMD) via a modified wiener filter for decomposing the spectral signal and the output of EMD is the denoised enhanced speech signal. A comparative evaluation is carried out between the proposed and existing models in terms of error measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734

  2. Anita JS, Abinaya JS (2019) Impact of supervised classifier on speech emotion recognition. Multimedia Res 2(1):9–16

    Google Scholar 

  3. Arul VH, Sivakumar VG, Marimuthu R, Chakraborty B (2019) An approach for speech enhancement using deep convolutional neural network. Multimedia Res 2(1):37–44

    Google Scholar 

  4. NOIZEUS: https://ecs.utdallas.edu/loizou/speech/noizeus/ (Access Date: 2021-05-06)

  5. Bekë K, Elezaj E, Millaku B, Dreshaj A, Hung NT (2021) The impact of COVID-19 (SARS-CoV-2) in tourism industry: evidence of Kosovo during Q1, Q2 and Q3 period of 2020. J Sustain Financ Invest:1–12

  6. Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Proc 81(11):2403–2418

  7. Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Sig Proc 27(2):113–120

  8. Chai L, Du J, Liu Q-F, Lee C-H (2021) A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 29:106–117

    Article  Google Scholar 

  9. Chung H, Plourde E, Champagne B (2017) Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Comm 87:18–30

    Article  Google Scholar 

  10. Cuiv X, Chen Z, Yin F (2021) Multi-objective based multi-channel speech enhancement with BiLSTM network. Appl Acoust

  11. Daniel M, Tan Z-H, Zhang S-X, Xu Y, Yu M, Yu D, Jensen J (2021) An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Trans Audio Speech Lang Process

  12. Darekar RV, Dhande AP (2019) Emotion recognition from speech signals using DCNN with hybrid GA-GWO algorithm. Multimedia Res 2(4):12–22

    Google Scholar 

  13. Dionelis N, Brookes M (2018) Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering. IEEE/ACM trans Audio Speech Lang Process 26(5):937–950

    Article  Google Scholar 

  14. Garg A (2020) Enhancement of speech signal using diminished empirical mean curve decomposition-based adaptive wiener filtering. in comm.

  15. Garg A (2020) Deep convolutional neural network based speech signal enhancement using extensive speech features. in comm.

  16. Gelderblom FB, Tronstad TV, Viggen EM (2019) Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement. IEEE/ACM trans Audio Speech Lang Process 27(3):583–594

    Article  Google Scholar 

  17. Hongjiang Y, Ouyang Z, Zhu WP, Champagne B, Ji Y (2019) A deep neural network based Kalman filter for time domain speech enhancement. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–5

  18. Ishaan G, et al (2017) “Improved training of wasserstein gans.” Advances in neural information processing systems vol 30

  19. Kolbæk M, Tan Z, Jensen J (2019) On the relationship between short-time objective intelligibility and short-time spectral-amplitude mean-square error for speech enhancement. IEEE/ACM Trans Audio speech Lang Process 27(2):283–295

    Article  Google Scholar 

  20. Lavanya T, Nagarajan T, Vijayalakshmi P (2020) Multi-level Single-Channel speech enhancement using a unified framework for estimating magnitude and phase spectra. IEEE/ACM Trans Audio Speech Lang Process 28:1315–1327

    Article  Google Scholar 

  21. Nicolson A, Paliwal KK (2018) Bidirectional long-short term memory network-based estimation of reliable spectral component locations. In: INTERSPEECH 1606-1610.

  22. Pfeifenberger L, Zöhrer M, Pernkopf F (2019) Eigenvector-based speech mask estimation for Multi-Channel speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 27(12):2162–2172

    Article  Google Scholar 

  23. Phillip I, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conf on comp vision and pattern recog 1125–1134

  24. Plapous C, Marro C, Mauuary L, Scalart P (2004) A two-step noise reduction technique. 2004 IEEE Int Conf Acoust, Speech, and Signal Process 1:289–292

  25. Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans ASLP 14:2098–2108

    Google Scholar 

  26. Reddy BG, Ofori M, Liu J, Ambati LS (2020) Early public outlook on the coronavirus disease (COVID-19): a soc med study

  27. Sadeghi M, Leglaive S, Alameda-Pineda X, Girin L, Horaud R (2020) Audio-visual speech enhancement using conditional variational auto-encoders. IEEE/ACM Trans Audio Speech Lang Process 28:1788–1800

    Article  Google Scholar 

  28. Saleem N, Khattak MI, Al-Hasan M, Qazi AB (2020) On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks. IEEE Access 8:160581–160595

    Article  Google Scholar 

  29. Saleem N, Khattak MI, Ochani MK (2021) Perceptually weighted β-order spectral amplitude Bayesian estimator for phase compensated speech enhancement. Appl Acoust 178:108007

    Article  Google Scholar 

  30. Santiago P, Bonafonte A, Serra J (2017) SEGAN: Speech enhancement generative adversarial network arXiv preprint arXiv: 1703.09452

  31. Sepp H, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  32. Shu X, Zhou Y, Liu H, Truong TK (2020) A human auditory perception loss function using modified bark spectral distortion for speech enhancement. Neural Process Lett 51(3):2945–2957

    Article  Google Scholar 

  33. Sun X, Gao Z-F, Lu Z-Y, Li J, Yan Y (2020) A model compression method with matrix product operators for speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 28:2837–2847

    Article  Google Scholar 

  34. Tayseer M, Adeel A, Hussain A (2018) A survey on techniques for enhancing speech. Int J Comput Appl 179(17):1–14

    Google Scholar 

  35. Triantafyllos A, Chung JS, Zisserman A (2018) The conversation: deep audio-visual speech enhancement. arXiv preprint arXiv:1804.04121

  36. Venkateswarlu S, China K, Prasad S, Reddy AS (2011) Improve Speech Enhancement Using Weiner Filtering. Global J Comput Sci Technol

  37. Wang Z, Zhang T, Ding B (2020) LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement. Appl Acoust 172:107647

    Article  Google Scholar 

  38. Wood SUN, Stahl JKW, Mowlaee P (2019) Binaural codebook-based speech enhancement with atomic speech presence probability. IEEE/ACM Trans Audio Speech Lang Process 27(12):2150–2161

    Article  Google Scholar 

  39. Xiang Y, Bao C (2020) A parallel-data-free speech enhancement method using multi-objective learning cycle-consistent generative adversarial network. IEEE/ACM trans Audio speech Lang Process 28:1826–1838

    Article  Google Scholar 

  40. Xu L, Wei Z, Zaidi SFA, Ren B, Yang J (2021) Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain. Appl Acoust 174:107732

    Article  Google Scholar 

  41. Yong X, Du J, Dai L-R, Lee C-H (2013) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):65–68

    Google Scholar 

  42. Yong KH, Yoon JW, Cheon SJ, Kang WH, Kim NS (2021) A multi-resolution approach to GAN-based speech enhancement. Appl Sci 11(2):721

    Article  Google Scholar 

  43. Yu H, Zhu W-P, Champagne B (2020) Speech enhancement using a DNN-augmented colored-noise Kalman filter. Speech Comm 125:142–151

    Article  Google Scholar 

  44. Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858

  45. Zhu Y, Xu X, Ye Z (2020) FLGCNN: a novel fully convolutional neural network for end-to-end monaural speech enhancement with utterance-based objective functions. Appl Acoust 170:107511

    Article  Google Scholar 

  46. Zou X, Zhang X (2007) Speech enhancement using an MMSE short time DCT coefficients estimator with supergaussian speech modeling. J Electron (China) 24(3):332–337

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anil Garg.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garg, A. Speech enhancement using long short term memory with trained speech features and adaptive wiener filter. Multimed Tools Appl 82, 3647–3675 (2023). https://doi.org/10.1007/s11042-022-13302-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13302-3

Keywords