Abstract
During the last few decades, speech signal enhancement has been one of the wide-spreading research topics. Numerous algorithms are being proposed to enhance the perceptibility and the quality of speech signal. These algorithms are often formulated to recover the clear signal from the signals that are ruined by noise. Usually, short-time Fourier transform and wavelet transform are widely used to process the speech signal. This paper attempts to overcome the regular drawbacks of the speech enhancement algorithms. As the frequency domain has good noise-removing ability, the short-time Fourier domain is also aimed to enhance the speech. Additionally, this paper introduces a decomposition model, named diminished empirical mean curve decomposition, to adaptively tune the Wiener filtering process and to accomplish effective speech enhancement. The performances of the proposed method and the conventional methods are compared, and it is observed that the proposed method is superior to the conventional methods.
Similar content being viewed by others
References
Moore AH, Peso Parada P, Naylor PA (2016) Speech enhancement for robust automatic speech recognition: evaluation using a baseline system and instrumental measures. Comput Speech Lang 86:85–96
Zao L, Coelho R, Flandrin P (2014) Speech enhancement with EMD and hurst-based mode selection. IEEE/ACM Trans Audio Speech Lang Process 22(5):899–911
Xu Y, Du J, Dai LR, Lee CH (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 23(1):7–19
Aroudi A, Veisi H, Sameti H (2015) Hidden Markov model-based speech enhancement using multivariate Laplace and Gaussian distributions. IET Signal Process 9(2):177–185
Baby D, Virtanen T, Gemmeke JF, Van Hamme H (2015) Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23(11):1788–1799
Chen Z, Hohmann V (2015) Online monaural speech enhancement based on periodicity analysis and a priori SNR estimation. IEEE/ACM Trans Audio Speech Lang Process 23(11):1904–1916
Deng F, Bao C, Kleijn WB (2015) Sparse hidden Markov models for speech enhancement in non-stationary noise environments. IEEE/ACM Trans Audio Speech Lang Process 23(11):1973–1987
Vihari S, Murthy AS, Soni P, Naik DC (2016) Comparison of speech enhancement algorithms. Procedia Comput Sci 89:666–676
Doi H, Toda T, Nakamura K, Saruwatari H, Shikano K (2014) Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Trans Audio Speech Lang Process 22(1):172–183
Gerkmann T, Krawczyk-Becker M, Le Roux J (2015) Phase processing for single-channel speech enhancement: history and recent advances. IEEE Signal Process Mag 32(2):55–66
Islam MT, Shahnaz C, Zhu WP, Ahmad MO (2015) Speech enhancement based on student t modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811
Jin YG, Shin JW, Kim NS (2014) Spectro-temporal filtering for multichannel speech enhancement in short-time Fourier transform domain. IEEE Signal Process Lett 21(3):352–355
Kim SM, Kim HK (2014) Direction-of-arrival based SNR estimation for dual-microphone speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 22(12):2207–2217
Ghanbari Y, Karami-Mollaei MR (2006) A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Commun 48(8):927–940
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445
Cohen I (2004) Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Lett 11(9):725–728
Berouti M, Schwartz R, Makhoul J (1979) Enhancement of speech corrupted by acoustic noise. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP ‘79, pp 208–211
Kamath S, Loizou P (2002) A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP). IEEE, Orlando, p IV-4164
Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466
Ayat S, Manzuri-Shalmani MT, Dianat R (2006) An improved wavelet-based speech enhancement by using speech signal features. Comput Electr Eng 32(6):411–425
Balaji GN, Subashini TS, Chidambaram N (2015) Detection of heart muscle damage from automated analysis of echocardiogram video. IETE J Res 61(3):236–243
Sunil Kumar BS, Manjunath AS, Christopher S (2018) Improved entropy encoding for high efficient video coding standard. Alexandria Eng J 57(1):1–9
Wagh AM, Todmal SR (2015) Eyelids, eyelashes detection algorithm and Hough transform method for noise removal in iris recognition. Int J Comput Appl 112(3):28–31
Sreedharan NPN, Ganesan B, Raveendran R, Sarala P, Dennis B, Rajakumar BR (2018) Grey Wolf optimisation-based feature selection and classification for facial emotion recognition. IET Biom 7(5):490–499
Bhowmick A, Chandra M (2017) Speech enhancement using voiced speech probability based wavelet decomposition. Comput Electr Eng 62:706–718
Chung H, Plourde E, Champagne B (2017) Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Commun 87:18–30
Mowlaee P, Stahl J, Kulmer J (2017) Iterative joint MAP single-channel speech enhancement given non-uniform phase prior. Speech Commun 86:85–96
Kammi S, Karami-Mollaei MR (2017) Noisy speech enhancement with sparsity regularization. Speech Commun 87:58–69
Li R, Liu Y, Shi Y, Dong L, Cui W (2016) ILMSAF based speech enhancement with DNN and noise classification. Speech Commun 85:53–70
Zhao Y, Qiu RC, Zhao X, Wang B (2016) Speech enhancement method based on low-rank approximation in a reproducing kernel Hilbert space. Appl Acoust 112:79–83
Liu Y, Nower N, Morita S, Unoki M (2016) Speech enhancement of instantaneous amplitude and phase for applications in noisy reverberant environments. Speech Commun 84:1–14
Sun M, Zhang X, Van Hamme H, Zheng TF (2016) Unseen noise estimation using separable deep auto encoder for speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 24(1):93–104
Chazan SE, Goldberger J, Gannot S (2016) A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier. IEEE/ACM Trans Audio Speech Lang Process 24(12):2516–2530
Wang SS et al (2016) Wavelet speech enhancement based on nonnegative matrix factorization. IEEE Signal Process Lett 23(8):1101–1105
Bhatnagar K, Gupta S (2017) Extending the neural model to study the impact of effective area of optical fiber on laser intensity. Int J Intell Eng Syst 10(4):274–283
Muaidi H (2014) Levenberg–Marquardt learning neural network for part-of-speech tagging of arabic sentences. Wseas Trans Comput 13:300–309
Boll SF (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Signal Process 27(2):113–120
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Signal Process 81(11):2403–2418
Plapous C, Marro C, Mauuary L, Scalart P (2004) A two-step noise reduction technique. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol 1, pp I-289–I292
Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans ASLP 14(6):2098–2108
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Garg, A., Sahu, O.P. Enhancement of speech signal using diminished empirical mean curve decomposition-based adaptive Wiener filtering. Pattern Anal Applic 23, 179–198 (2020). https://doi.org/10.1007/s10044-018-00768-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-018-00768-x