Abstract
In this paper, a new approach for robust speech enhancement based on improved ensemble empirical mode decomposition (EMD) using optimized log-spectral amplitude noise estimation is presented. In this approach, a noisy signal is decomposed adaptively into a sum of oscillating components that belong to intrinsic mode functions (IMFs); then, each component is enhanced separately to provide less-corrupted IMFs that are used by the Hurst exponent method to construct an estimate of a clean signal. This new framework takes advantage of adaptive noise estimation performed by improved minima-controlled recursive averaging for noise estimation and optimally modified log-spectral amplitude to enhance the noisy EMD components. Through experimental evidence, the objective evaluation of quality and intelligibility demonstrates that the proposed method performs significantly better than the baseline techniques, including the most recently developed EMD-based speech enhancement methods.
Similar content being viewed by others
Data availability statement
The datasets and code generated and/or analyzed during the current study are available from Asma Bouchair (first author) on reasonable request.
References
Albu F, Dumitriu N, Stanciu L D (1996) Speech Enhancement by Spectral Subtraction, Proceedings of International Symposium on Electronics and Telecommunications, Bucharest, Romania: pp.78–83.
I. Cohen, Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Process. Lett. 9, 113–116 (2002)
I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech and Audio Process. 11, 466–475 (2003)
I. Cohen, B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Process. Lett. 9, 12–15 (2002)
M.A. Colominas, G. Schlotthauer, M.E. Torres, Improved complete ensemble EMD: a suitable tool for biomedical signal processing. Biomed. Signal Process and Control 14, 19–29 (2014)
N. Chatlani, J. Soraghan, EMD-based filtering (EMDF) of low-frequency noise for speech enhancement, IEEE Trans. Audio, Speech, and Language Process. 20, 1158–1166 (2012)
Y. Cheng, Z. Wang, B. Chen, W. Zhang, G. Huang, An improved complementary ensemble empirical mode decomposition with adaptive noise and its application to rolling element bearing fault diagnosis. ISA Transations 91, 218–234 (2019)
Chen Z, Watanabe S, Erdogan H, Hershey J R (2015) Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks, Int. Speech Com. Assoc. Conf. Interspeech, pp. 3274 –3278.
A.K. Dwivedi, H. Ranjan, A. Menon, P. Periasamy, Noise reduction in ECG signal using combined ensemble empirical mode decomposition method with stationary wavelet transform. Circuits Systems Signal Process. 40, 827–844 (2021)
D.L. Donoho, De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41, 613–627 (1995)
I. Daubechies, Ten Lectures on Wavelets (Society for Industrial and Applied Mathematics, Philadelphia, USA, 1992)
K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62, 531–544 (2014)
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process 32, 1109–1121 (1984)
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process (ASSP) 33, 443–445 (1985)
Flandrin P, Gonçalves P, Rilling G (2004) Detrending and denoising with empirical mode decompositions, Proc. European Signal Process. Conf., pp. 1581–1584.
Fu S W, Tsao Y, Lu X (2016) SNR aware convolutional neural network modeling for speech enhancement, Proc. Interspeech.
Fu S W, Hu T Y, Tsao Y, Lu X (2017) Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, Proc. Mach. Learn. Signal Process.
S.W. Fu, T.W. Wang, Y. Tsao, X. Lu, H. Kawai, End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks, IEEE/ACM Trans. Audio, Speech, and Language Process. 26, 1570–1584 (2018)
Garofolo J S, Lamel L F, Fisher W M, Fiscus J G, Pallett D S, Dahlgren N L (1993) The DARPA TIMIT acoustic-phonetic continuous speech corpus CDROM.
Huang N E, Shen Z, Long S, Wu M, Shih H, Zheng Q, Yen N, Tung C, Liu H(1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis, Proc. R. Soc. London pp. 903–995.
Y. Hu, P.C. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Trans. Speech and Audio Processing 11, 334–341 (2003)
Y. Hu, P. Loizou, Evaluation of objective measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16, 229–238 (2008)
ITU-T Rec. P.862 (2001) Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, ITU, Online : https://www.itu.int/rec/T-REC-P.862
F. Jabloun, B. Champagne, Incorporating the human hearing properties in the signal subspace approach for speech enhancement, IEEE Trans. Speech and Audio Processing 11, 700–708 (2003)
B. Kumar, Comparative Performance Evaluation of Greedy Algorithms for Speech Enhancement System. Fluctuation and Noise Letters (2020). https://doi.org/10.1142/S0219477521500176
Khaldi K, Boudraa A, Bouchikhi A, Alouane M (2008) Speech enhancement via EMD, EURASIP J. Adv. Signal Process. Article ID 873204.
Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder, Int Speech Commun Assoc. Conf. Interspeech, pp. 436–440.
N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Speech, and Language Processing 21, 2140–2151 (2013)
Pascual S, Bonafonte A, Serr J (2017) Segan: Speech enhancement generative adversarial network, Proc. Interspeech, pp. 642–3646.
Park S R, Lee J (2017) A fully convolutional neural network for speech enhancement, Proc. Interspeech.
M.S. Rudramurthy, N.K. Pathak, V.K. Prasad, R. Kumaraswamy, Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions. J. Intell. Syst. 23(4), 405–421 (2014)
Scalart P, Filho J V (1996) Speech enhancement based on a priori signal to noise estimation, Proc. IEEE Int. Conf. Acoust. Speech Signal Process, pp. 629–632.
R. Sharma, S.R.M. Prasanna, A better decomposition of speech obtained using modified Empirical Mode Decomposition. Digital Signal Processing 58, 26–39 (2016)
R. Sharma, L. Vignolo, G. Schlotthauer, M.A. Colominas, L. Rufiner, S.R.M. Prasanna, Empirical Mode Decomposition for adaptive AM-FM analysis of speech: A review. Speech Commun. 88, 39–64 (2017)
Torres M E, Colominas M A, Schlotthauer G, Flandrin P (2011) A complete ensemble empirical mode decomposition with adaptive noise, Proc. 36th IEEE Int. Conf. Acoust. Speech and Signal Process (ICASSP), pp. 4144–4147.
A. Upadhyay, R.B. Pachori, Speech enhancement based on mEMD-VMD method. Electron. Lett. 53, 502–504 (2017)
D. Veitch, P. Abry, A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Trans. Inf. Theory 45, 878–897 (1999)
S.R. Vumanthala, B. Kalagadda, Nonlocal means estimation of intrinsic mode functions for speech enhancement. Turk J Elec Eng & Comp Sci 28, 318–330 (2020)
C. Wang, H. Li, D. Zhao, A preconditioning framework for the empirical mode decomposition method. Circuits Systems Signal Process. 37, 5417–5440 (2018)
Weninger F, Eyben F, Schuller B (2014) Single-channel speech separation with memory-enhanced recurrent neural networks, Proc. ICASSP, pp. 3709–3713.
Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1, 1–41 (2009)
J.-R. Yeh, J.-S. Shieh, N.E. Huang, Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2, 135–156 (2010)
X. Ye, Y. Hu, J. Shen, R. Feng, G. Zhai, An Improved Empirical Mode Decomposition Based on Adaptive Weighted Rational Quartic Spline for Rolling Bearing Fault Diagnosis. IEEE Access 8, 123813–123827 (2020). https://doi.org/10.1109/ACCESS.2020.3006030
D. Zhao, Z. Huang, H. Li, J. Chen, P. Wang, An improved EEMD method based on the adjustable cubic trigonometric cardinal spline interpolation. Digital Signal Processing 64, 41–48 (2017)
J. Zheng, H. Pan, Mean-optimized mode decomposition: An improved EMD approach for non-stationary signal processing. ISA Trans. 106, 392–401 (2020)
L. Zão, R. Coelho, P. Flandrin, Speech enhancement with EMD and Hurst-based mode selection, IEEE/ACM Trans. Audio, Speech, and Language Process. 22, 899–911 (2014)
Funding
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under reference number RGPIN-2018-05221 and by the Ministry of Higher Education and Scientific Research of Algeria.
Author information
Authors and Affiliations
Contributions
A new empirical mode decomposition method for speech enhancement is proposed. This method is able to manage nonlinear distortions caused by noise effects and can reduce noise separately for each frequency range, making it useful for a wide range of noise. Through experimental evidence, the objective assessment of quality and intelligibility demonstrates that the proposed method performs much better than the state-of-the-art techniques, including the most recently developed EMD- and deep learning-based speech enhancement methods. This work has not been published previously, and it is not under consideration for publication elsewhere.
Corresponding author
Ethics declarations
Conflicts of interest
Not applicable.
Availability of data and material
The data, namely the TIMIT corpus and the NOISEX-92 dataset, are publicly available.
Code availability
The code will be made available under the GitHub platform and upon request.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bouchair, A., Selouani, S.A., Amrouche, A. et al. Improved Empirical Mode Decomposition Using Optimal Recursive Averaging Noise Estimation for Speech Enhancement. Circuits Syst Signal Process 41, 196–223 (2022). https://doi.org/10.1007/s00034-021-01767-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01767-w