Skip to main content
Log in

Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Statistical Estimation using the SNR uncertainty technique is one of the effective Speech Enhancement (SE) algorithms. In this method, the Gain function plays a crucial role and it depends on the proper selection of the smoothing and threshold constants. In the literature, the values of these constants have been optimized by considering a single objective function of maximization of speech quality for a specific noise condition. But in practice, the noise magnitude varies and one set of optimized parameters cannot always provide consistent performance. In this paper, this problem has been addressed and solved in three steps. The first step is multi-objective optimization to find the best set of values of smoothing and threshold constants at different noise levels by considering the objectives of maximization of speech quality, intelligibility, and minimization of mean square error. The second step is the classification of the noisy speech into four SNR levels such as 0 dB, 5 dB, 10 dB, and 15 dB by using appropriate audio features. The values obtained in steps one and two are stored and in the third step, when the unknown noisy speech signal is to be enhanced the best-chosen values of the smoothing and threshold constants are selected for this task. Finally, the performance of the proposed method is evaluated in two different speech datasets. Then, comparative performance and statistical analysis are carried out using six other standard SE algorithms and it is demonstrated that the proposed approach provides superior performance than others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Chen J, Benesty J, Huang YA, Diethorn EJ (2008) “Fundamentals of noise reduction.” pp 843–872

  2. Khonglah BK, Dey A, Prasanna SM (2019) Speech enhancement using source information for phoneme recognition of speech with background music. Circuits Syst Signal Process 38(2):643–663

    Google Scholar 

  3. Benesty J (2018) Fundamentals of speech enhancement. Springer, Berlin

    Google Scholar 

  4. Mohanty BK, Panda G, Puhan NB et al (2018) Hardware design for VLSI implementation of acoustic feedback canceller in hearing aids. Circuits Syst Signal Process 37(4):1383–1406

    MathSciNet  Google Scholar 

  5. Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca Raton

    Google Scholar 

  6. McAulay R, Malpass M (1980) Speech enhancement using a soft-decision noise suppression filter. IEEE Trans Acoust Speech Signal Process 28(2):137–145

    Google Scholar 

  7. Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal process 32(6):1109–1121

    Google Scholar 

  8. Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445

    Google Scholar 

  9. Lotter T, Vary P (2005) Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J Adv Signal Process 2005(7):354850

    MATH  Google Scholar 

  10. Loizou PC (2005) Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Trans Speech Audio Process 13(5):857–869

    Google Scholar 

  11. Lu Y, Loizou PC (2010) Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Trans Audio Speech Lang Process 19(5):1123–1137

    Google Scholar 

  12. Cohen I (2005) Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans Speech Audio Process 13(5):870–881

    Google Scholar 

  13. You CH, Koh SN, Rahardja S (2005) /spl beta/-order MMSE spectral amplitude estimation for speech enhancement. IEEE Trans Speech Audio Process 13(4):475–486

    Google Scholar 

  14. Zhao Y, Zhao X, Wang B (2014) A speech enhancement method based on sparse reconstruction of power spectral density. Comput Electr Eng 40(4):1080–1089

    Google Scholar 

  15. Tu J, Xia Y (2015) Fast distributed multichannel speech enhancement using novel frequency domain estimators of magnitude-squared spectrum. Speech Commun 72:96–108

    Google Scholar 

  16. Tengtrairat N, Woo WL, Dlay SS, Gao B (2015) Online noisy single-channel source separation using adaptive spectrum amplitude estimator and masking. IEEE Trans Signal Process 64(7):1881–1895

    MathSciNet  MATH  Google Scholar 

  17. Mourad T (2017) Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum. Intern J Speech Technol 20(1):75–88

    Google Scholar 

  18. Sandoval-Ibarra Y, Diaz-Ramirez VH, Kober VI, Karnaukhov VN (2016) Speech enhancement with adaptive spectral estimators. J Commun Technol Electr 61(6):672–678

    Google Scholar 

  19. Wang J, Yang G, Liu J, Peng R (2016) “The a Priori SNR Estimator Based on Cepstral Processing.” In Audio engineering society convention 141

  20. Yadava TG, Jayanna HS (2018) Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Intern J Speech Technol 22:1–10

    Google Scholar 

  21. Mahmmod BM, Ramli AR, Abdulhussian SH, Al-Haddad SAR, Jassim WA (2017) Low-distortion MMSE speech enhancement estimator based on Laplacian prior. IEEE Access 5:9866–9881

    Google Scholar 

  22. Peng R, Tan Z-H, Li X, Zheng C (2018) A perceptually motivated LP residual estimator in noisy and reverberant environments. Speech Commun 96:129–141

    Google Scholar 

  23. Kumar B (2018) Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. Intern J Speech Technol 21(4):1033–1044

    Google Scholar 

  24. Pastor D, Bey AAE et al. (2018) “Joint soft threshold and statistical estimation for speech enhancement.”

  25. Zhang Q, Wang M, Lu Y, Zhang L, Idrees M (2019) A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator. Dig Signal Process 88:41–52

    Google Scholar 

  26. Zhang Q, Wang M, Lu Y, Idrees M, Zhang L (2019) Fast nonstationary noise tracking based on log-spectral power MMSE estimator and temporal recursive averaging. IEEE Access 7:80985–80999

    Google Scholar 

  27. Nahma L, Yong PC, Dam HH, Nordholm S (2019) An adaptive a priori SNR estimator for perceptual speech enhancement. EURASIP J Audio Speech Music Process 2019(1):7

    Google Scholar 

  28. Dash TK, Solanki SS, Panda G (2019) Improved phase aware speech enhancement using bio-inspired and ANN techniques. Analog Integr Circ Sig Process 102:465–477

    Google Scholar 

  29. Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs.” In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol 2, pp 749–752

  30. Krishnamurthy N, Hansen JH (2009) Babble noise: modeling, analysis, and applications. IEEE Trans Audio Speech Lang Process 17(7):1394–1407

    Google Scholar 

  31. Loizou PC, Kim G (2010) Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Trans Audio Speech Lang Process 19(1):47–56

    Google Scholar 

  32. Loizou P (2017) NOIZEUS: a noisy speech corpus for evaluation of speech enhancement algorithms. Speech Commun 49:588–601

    Google Scholar 

  33. Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136

    Google Scholar 

  34. Coello CC, Lechuga MS (2002) “MOPSO: A proposal for multiple objective particle swarm optimization.” In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), vol 2, pp 1051–1056

  35. Coello CAC, Lamont GB, Veldhuizen DAV et al (2007) Evolutionary algorithms for solving multi-objective problems. Springer, Berlin

    MATH  Google Scholar 

  36. Soleymani R, Selesnick IW, Landsberger DM (2018) SEDA: a tunable Q-factor wavelet-based noise reduction algorithm for multi-talker babble. Speech commun 96:102–115

    Google Scholar 

  37. Hirsch H-G, Pearce D (2000) “The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions.” In ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW)

  38. Rangachari S, Loizou PC, Hu Y (2004) “A noise estimation algorithm with rapid adaptation for highly nonstationary environments.” In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 1, p 305

  39. Yang W, Wang K, Zuo W (2012) Neighborhood component feature selection for high-dimensional data. JCP 7(1):161–168

    Google Scholar 

  40. Dash TK, Solanki SS (2019) Investigation on the effect of the input features in the noise level classification of noisy speech. J Sci Ind Res 78(12):868–872

    Google Scholar 

  41. Lerch A (2012) An introduction to audio content analysis: applications in signal processing and music informatics. Wiley, Amsterdam

    Google Scholar 

  42. Dash TK, Solanki SS (2020) Development and use of a new speech quality evaluation parameter ESNR using ANN and Grey Wolf Optimizer. J Sci Ind Res 79(3):197–200

    Google Scholar 

  43. Auria L, Moro RA (2008) “Support vector machines (SVM) as a technique for solvency analysis,”

  44. Suthaharan S (2016) “Support vector machine.” pp 207–235

  45. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Google Scholar 

  46. Mirjalili S, Saremi S, Mirjalili SM, dos Coelho LS (2016) Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization. Expert Syst Appl 47:106–119

    Google Scholar 

  47. Moazzami M, Ghanbari M, Shahinzadeh H, Moradi J, Gharehpetian GB (2018) “Application of multi-objective grey wolf algorithm on energy management of microgrids with techno-economic and environmental considerations.” In 2018 3rd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), pp 1–9

  48. Pradhan PM, Panda G (2012) Connectivity constrained wireless sensor deployment using multiobjective evolutionary algorithms and fuzzy decision making. Ad Hoc Netw 10(6):1134–1145

    Google Scholar 

  49. Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238

    Google Scholar 

  50. Ma J, Loizou PC (2011) SNR loss: a new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Commun 53(3):340–354

    Google Scholar 

  51. Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student \$ t \$ modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811

    Google Scholar 

  52. Kamath S, Loizou P (2002) “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise.” In ICASSP, vol 4, pp 44–164

  53. Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108

    Google Scholar 

  54. Stark AP, Wójcicki KK, Lyons JG, Paliwal KK (2008) “Noise driven short-time phase spectrum compensation procedure for speech enhancement.” In 9th Annual Conference of the International Speech Communication Association

  55. Voiers WD (1980) “Interdependencies among measures of speech intelligility and speech” Quality”.” In ICASSP’80. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 5, pp 703–705

  56. Wan E, Nelson A, Peterson R (2002) Speech enhancement assessment resource (SpEAR) database. CSLU, Oregon Graduate Institute of Science and Technology, Beta version Release v1. 0, http://ee.ogi.edu/NSEL

  57. Rajasekhar B, Kamaraju M, Sumalatha V (2019) “Glowworm swarm based fuzzy classifier with dual features for speech emotion recognition.” Evolut Intel 1–15

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tusar Kanti Dash.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dash, T.K., Solanki, S.S., Panda, G. et al. Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer. Evol. Intel. 14, 767–778 (2021). https://doi.org/10.1007/s12065-020-00446-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00446-0

Keywords

Navigation