Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer

Dash, Tusar Kanti; Solanki, Sandeep Singh; Panda, Ganapati; Satapathy, Suresh Chandra

doi:10.1007/s12065-020-00446-0

Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer

Special Issue
Published: 14 July 2020

Volume 14, pages 767–778, (2021)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Tusar Kanti Dash ORCID: orcid.org/0000-0002-7964-7485^1,2,
Sandeep Singh Solanki¹,
Ganapati Panda² &
…
Suresh Chandra Satapathy³

190 Accesses
3 Citations
Explore all metrics

Abstract

Statistical Estimation using the SNR uncertainty technique is one of the effective Speech Enhancement (SE) algorithms. In this method, the Gain function plays a crucial role and it depends on the proper selection of the smoothing and threshold constants. In the literature, the values of these constants have been optimized by considering a single objective function of maximization of speech quality for a specific noise condition. But in practice, the noise magnitude varies and one set of optimized parameters cannot always provide consistent performance. In this paper, this problem has been addressed and solved in three steps. The first step is multi-objective optimization to find the best set of values of smoothing and threshold constants at different noise levels by considering the objectives of maximization of speech quality, intelligibility, and minimization of mean square error. The second step is the classification of the noisy speech into four SNR levels such as 0 dB, 5 dB, 10 dB, and 15 dB by using appropriate audio features. The values obtained in steps one and two are stored and in the third step, when the unknown noisy speech signal is to be enhanced the best-chosen values of the smoothing and threshold constants are selected for this task. Finally, the performance of the proposed method is evaluated in two different speech datasets. Then, comparative performance and statistical analysis are carried out using six other standard SE algorithms and it is demonstrated that the proposed approach provides superior performance than others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A modified Wiener filtering method combined with wavelet thresholding multitaper spectrum for speech enhancement

Article Open access 27 August 2014

Yanna Ma & Akinori Nishihara

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Article 29 August 2023

Sivaramakrishna Yechuri & Sunnydayal Vanambathina

Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques

Article 15 June 2021

Tusar Kanti Dash, Sandeep Singh Solanki & Ganapati Panda

References

Chen J, Benesty J, Huang YA, Diethorn EJ (2008) “Fundamentals of noise reduction.” pp 843–872
Khonglah BK, Dey A, Prasanna SM (2019) Speech enhancement using source information for phoneme recognition of speech with background music. Circuits Syst Signal Process 38(2):643–663
Google Scholar
Benesty J (2018) Fundamentals of speech enhancement. Springer, Berlin
Google Scholar
Mohanty BK, Panda G, Puhan NB et al (2018) Hardware design for VLSI implementation of acoustic feedback canceller in hearing aids. Circuits Syst Signal Process 37(4):1383–1406
MathSciNet Google Scholar
Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca Raton
Google Scholar
McAulay R, Malpass M (1980) Speech enhancement using a soft-decision noise suppression filter. IEEE Trans Acoust Speech Signal Process 28(2):137–145
Google Scholar
Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal process 32(6):1109–1121
Google Scholar
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445
Google Scholar
Lotter T, Vary P (2005) Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J Adv Signal Process 2005(7):354850
MATH Google Scholar
Loizou PC (2005) Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Trans Speech Audio Process 13(5):857–869
Google Scholar
Lu Y, Loizou PC (2010) Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Trans Audio Speech Lang Process 19(5):1123–1137
Google Scholar
Cohen I (2005) Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans Speech Audio Process 13(5):870–881
Google Scholar
You CH, Koh SN, Rahardja S (2005) /spl beta/-order MMSE spectral amplitude estimation for speech enhancement. IEEE Trans Speech Audio Process 13(4):475–486
Google Scholar
Zhao Y, Zhao X, Wang B (2014) A speech enhancement method based on sparse reconstruction of power spectral density. Comput Electr Eng 40(4):1080–1089
Google Scholar
Tu J, Xia Y (2015) Fast distributed multichannel speech enhancement using novel frequency domain estimators of magnitude-squared spectrum. Speech Commun 72:96–108
Google Scholar
Tengtrairat N, Woo WL, Dlay SS, Gao B (2015) Online noisy single-channel source separation using adaptive spectrum amplitude estimator and masking. IEEE Trans Signal Process 64(7):1881–1895
MathSciNet MATH Google Scholar
Mourad T (2017) Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum. Intern J Speech Technol 20(1):75–88
Google Scholar
Sandoval-Ibarra Y, Diaz-Ramirez VH, Kober VI, Karnaukhov VN (2016) Speech enhancement with adaptive spectral estimators. J Commun Technol Electr 61(6):672–678
Google Scholar
Wang J, Yang G, Liu J, Peng R (2016) “The a Priori SNR Estimator Based on Cepstral Processing.” In Audio engineering society convention 141
Yadava TG, Jayanna HS (2018) Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Intern J Speech Technol 22:1–10
Google Scholar
Mahmmod BM, Ramli AR, Abdulhussian SH, Al-Haddad SAR, Jassim WA (2017) Low-distortion MMSE speech enhancement estimator based on Laplacian prior. IEEE Access 5:9866–9881
Google Scholar
Peng R, Tan Z-H, Li X, Zheng C (2018) A perceptually motivated LP residual estimator in noisy and reverberant environments. Speech Commun 96:129–141
Google Scholar
Kumar B (2018) Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. Intern J Speech Technol 21(4):1033–1044
Google Scholar
Pastor D, Bey AAE et al. (2018) “Joint soft threshold and statistical estimation for speech enhancement.”
Zhang Q, Wang M, Lu Y, Zhang L, Idrees M (2019) A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator. Dig Signal Process 88:41–52
Google Scholar
Zhang Q, Wang M, Lu Y, Idrees M, Zhang L (2019) Fast nonstationary noise tracking based on log-spectral power MMSE estimator and temporal recursive averaging. IEEE Access 7:80985–80999
Google Scholar
Nahma L, Yong PC, Dam HH, Nordholm S (2019) An adaptive a priori SNR estimator for perceptual speech enhancement. EURASIP J Audio Speech Music Process 2019(1):7
Google Scholar
Dash TK, Solanki SS, Panda G (2019) Improved phase aware speech enhancement using bio-inspired and ANN techniques. Analog Integr Circ Sig Process 102:465–477
Google Scholar
Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs.” In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol 2, pp 749–752
Krishnamurthy N, Hansen JH (2009) Babble noise: modeling, analysis, and applications. IEEE Trans Audio Speech Lang Process 17(7):1394–1407
Google Scholar
Loizou PC, Kim G (2010) Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Trans Audio Speech Lang Process 19(1):47–56
Google Scholar
Loizou P (2017) NOIZEUS: a noisy speech corpus for evaluation of speech enhancement algorithms. Speech Commun 49:588–601
Google Scholar
Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136
Google Scholar
Coello CC, Lechuga MS (2002) “MOPSO: A proposal for multiple objective particle swarm optimization.” In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), vol 2, pp 1051–1056
Coello CAC, Lamont GB, Veldhuizen DAV et al (2007) Evolutionary algorithms for solving multi-objective problems. Springer, Berlin
MATH Google Scholar
Soleymani R, Selesnick IW, Landsberger DM (2018) SEDA: a tunable Q-factor wavelet-based noise reduction algorithm for multi-talker babble. Speech commun 96:102–115
Google Scholar
Hirsch H-G, Pearce D (2000) “The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions.” In ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW)
Rangachari S, Loizou PC, Hu Y (2004) “A noise estimation algorithm with rapid adaptation for highly nonstationary environments.” In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 1, p 305
Yang W, Wang K, Zuo W (2012) Neighborhood component feature selection for high-dimensional data. JCP 7(1):161–168
Google Scholar
Dash TK, Solanki SS (2019) Investigation on the effect of the input features in the noise level classification of noisy speech. J Sci Ind Res 78(12):868–872
Google Scholar
Lerch A (2012) An introduction to audio content analysis: applications in signal processing and music informatics. Wiley, Amsterdam
Google Scholar
Dash TK, Solanki SS (2020) Development and use of a new speech quality evaluation parameter ESNR using ANN and Grey Wolf Optimizer. J Sci Ind Res 79(3):197–200
Google Scholar
Auria L, Moro RA (2008) “Support vector machines (SVM) as a technique for solvency analysis,”
Suthaharan S (2016) “Support vector machine.” pp 207–235
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Google Scholar
Mirjalili S, Saremi S, Mirjalili SM, dos Coelho LS (2016) Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization. Expert Syst Appl 47:106–119
Google Scholar
Moazzami M, Ghanbari M, Shahinzadeh H, Moradi J, Gharehpetian GB (2018) “Application of multi-objective grey wolf algorithm on energy management of microgrids with techno-economic and environmental considerations.” In 2018 3rd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), pp 1–9
Pradhan PM, Panda G (2012) Connectivity constrained wireless sensor deployment using multiobjective evolutionary algorithms and fuzzy decision making. Ad Hoc Netw 10(6):1134–1145
Google Scholar
Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Google Scholar
Ma J, Loizou PC (2011) SNR loss: a new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Commun 53(3):340–354
Google Scholar
Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student \$ t \$ modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811
Google Scholar
Kamath S, Loizou P (2002) “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise.” In ICASSP, vol 4, pp 44–164
Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108
Google Scholar
Stark AP, Wójcicki KK, Lyons JG, Paliwal KK (2008) “Noise driven short-time phase spectrum compensation procedure for speech enhancement.” In 9th Annual Conference of the International Speech Communication Association
Voiers WD (1980) “Interdependencies among measures of speech intelligility and speech” Quality”.” In ICASSP’80. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 5, pp 703–705
Wan E, Nelson A, Peterson R (2002) Speech enhancement assessment resource (SpEAR) database. CSLU, Oregon Graduate Institute of Science and Technology, Beta version Release v1. 0, http://ee.ogi.edu/NSEL
Rajasekhar B, Kamaraju M, Sumalatha V (2019) “Glowworm swarm based fuzzy classifier with dual features for speech emotion recognition.” Evolut Intel 1–15

Download references

Author information

Authors and Affiliations

Electronics and Communication Engineering, Birla Institute of Technology, Mesra, India
Tusar Kanti Dash & Sandeep Singh Solanki
Electronics & Telecommunication Engineering, C V Raman Global University, Bhubaneswar, India
Tusar Kanti Dash & Ganapati Panda
School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy

Authors

Tusar Kanti Dash
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Singh Solanki
View author publications
You can also search for this author in PubMed Google Scholar
Ganapati Panda
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Chandra Satapathy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tusar Kanti Dash.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dash, T.K., Solanki, S.S., Panda, G. et al. Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer. Evol. Intel. 14, 767–778 (2021). https://doi.org/10.1007/s12065-020-00446-0

Download citation

Received: 19 March 2020
Revised: 07 June 2020
Accepted: 27 June 2020
Published: 14 July 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s12065-020-00446-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer

Abstract

Access this article

Similar content being viewed by others

A modified Wiener filtering method combined with wavelet thresholding multitaper spectrum for speech enhancement

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer

Abstract

Access this article

Similar content being viewed by others

A modified Wiener filtering method combined with wavelet thresholding multitaper spectrum for speech enhancement

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation