Abstract
Speech signals observed from distantly placed microphones may have some acoustic interference, such as noise and reverberation. These may lead to the degradation of the quality of blind speech. Hence, it is necessary to process the acquired speech signals to separate the blind source and eliminate the reverberation. Therefore, we proposed a novel speech separation and dereverberation method, which is based on the incorporation of Locally Weighted Projection Regression (LWPR)-based Principal Component Analysis (PCA) and Deep Neural Network (DNN)-based Weighted Prediction Error (WPE). The proposed method preprocesses the mixed reverberant signal prior to the application of Blind Source Separation (BSS) and Blind Dereverberation (BD). The preprocessing of the input sample signals is performed with the exploitation of fast Fourier transform (FFT) and whitening approaches to convert the time domain signal into frequency domain signal and to generate the transformation matrices. Besides, the utilization of LWPR-PCA can perform the BSS and the DNN-WPE can be used to conduct the BD. Moreover, the experimental analysis of our proposed method is compared with the existing RPCA-SNMF, CBF, BA-CNMF, AFMNMF, and ISC-LPKF approaches. The experimental outcomes depict that the proposed method effectively separates the original signal from the mixed reverberant signals.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Availability of data and material
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
Herzog A, Habets EA (2020) Direction and reverberation preserving noise reduction of ambisonics signals. IEEE/ACM Trans Audio Speech Lang Process 28:2461–2475
Xiao Y, Lu W, Yan Q, Zhang H (2021) Blind separation of coherent multipath signals with impulsive interference and Gaussian noise in time-frequency domain. Signal Process 178:107750
Gultepe E, Makrehchi M (2018) Improving clustering performance using independent component analysis and unsupervised feature learning. HCIS 8(1):1–19
Sunohara M, Haruta C, Ono N (2017) March. Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), (pp 216–220). IEEE
Grezes F, Ni Z, Trinh VA, Mandel M (2020) Enhancement of spatial clustering-based time-frequency masks using LSTM neural networks. arXiv preprint arXiv:2012.01576
Parchami M, Zhu WP, Champagne B (2017) Model-based estimation of late reverberant spectral variance using modified weighted prediction error method. Speech Commun 92:100–113
Boeddeker C, Nakatani T, Kinoshita K, Haeb-Umbach R (2020) Jointly optimal dereverberation and beamforming. In: ICASSP 2020–2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 216–220
Greenewald K, Hero AO (2015) Robust kronecker product PCA for spatio-temporal covariance estimation. IEEE Trans Signal Process 63(23):6368–6378
Khosravy M, Gupta N, Dey N, Crespo RG (2022) Underwater IoT network by blind MIMO OFDM transceiver based on probabilistic Stone’s blind source separation. ACM Trans Sensor Netw (TOSN) 18(3):1–27
Li C, Zhu L, Luo Z, Zhang Z, Yang Y (2022) Effective methods and performance analysis on data transmission security with blind source separation in space-based AIS. China Commun 19(4):154–165
Ma B, Zhang T (2019) An analysis approach for multivariate vibration signals integrate HIWO/BBO optimized blind source separation with NA-MEMD. IEEE Access 7:87233–87245
Jia Y, Xu P (2020) Convolutive blind source separation for communication signals based on the sliding Z-transform. IEEE Access 8:41213–41219
Zhang Z, Gao H, Ma J, Wang S, Sun H (2021) Blind source separation based on quantum slime mould algorithm in impulse noise. Math Problems Eng 2021:1–17
Wu B, Li K, Huang Z, Siniscalchi SM, Yang M, Lee CH (2017, March) A unified deep modeling approach to simultaneous speech dereverberation and recognition for the REVERB challenge. In: 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), pp 36–40, IEEE
Lee I, Kim T, Lee TW (2007) Fast fixed-point independent vector analysis algorithms for convolutive blind source separation. Signal Process 87(8):1859–1871
Do HD, Tran ST, Chau DT (2020) Speech source separation using variational autoencoder and bandpass filter. IEEE Access 8:156219–156231
Nakatani T, Boeddeker C, Kinoshita K, Ikeshita R, Delcroix M, Haeb-Umbach R (2020) Jointly optimal denoising, dereverberation, and source separation. IEEE ACM Trans Audio Speech Lang Process 28:2267–2282
Ullah R, Islam MS, Hossain MI, Wahab FE, Ye Z (2020) Single channel speech dereverberation and separation using RPCA and SNMF. Appl Acoust 167:107406
Song S, Cheng L, Luan S, Yao D, Li J, Yan Y (2021) An integrated multi-channel approach for joint noise reduction and dereverberation. Appl Acoust 171:107526
He R, Long Y, Li Y, Liang J (2020) Mask-based blind source separation and MVDR beamforming in ASR. Int J Speech Technol 23(1):133–140
Tan K, Xu Y, Zhang SX, Yu M, Yu D (2020) Audio-visual speech separation and dereverberation with a two-stage multimodal network. IEEE J Select Topics Signal Process 14(3):542–553
Khan JB, Jan T, Khalil RA, Altalbe A (2020) Hybrid source prior based independent vector analysis for blind separation of speech signals. IEEE Access 8:132871–132881
Nugraha AA, Sekiguchi K, Fontaine M, Bando Y, Yoshii K (2020) Flow-based independent vector analysis for blind source separation. IEEE Signal Process Lett 27:2173–2177
Togami M (2020) Joint training of deep neural networks for multi-channel dereverberation and speech source separation. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3032–3036, IEEE
Nakatani T, Takahashi R, Ochiai T, Kinoshita K, Ikeshita R, Delcroix M, Araki S (2020) DNN-supported mask-based convolutional beamforming for simultaneous denoising, dereverberation, and source separation. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6399–6403, IEEE
Sekiguchi K, Bando Y, Nugraha AA, Fontaine M, Yoshii K (2021) Autoregressive fast multichannel nonnegative matrix factorization for joint blind source separation and dereverberation. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 511–515, IEEE
Sheeja JJ, Sankaragomathi B (2022) CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures. Signal Image Video Process 16:1323–1331
Bulut AE, Koishida K (2020) Low-latency single channel speech dereverberation using U-net convolutional neural networks. In: Interspeech, pp 2442–2446
Tsai TH, Liu PY, Chiou YH (2022) Hardware design for Blind source separation using a fast time-frequency mask technique. Integration 82:67–77
Kumar M, Jayanthi VE (2020) Blind source separation using kurtosis, negentropy and maximum likelihood functions. Int J Speech Technol 23(1):13–21
Huang L, Zhao L, Zhou Y, Zhu F, Liu L, Shao L (2020) An investigation into the stochasticity of batch whitening. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 6439–6448
Klanke S, Vijayakumar S, Schaal S (2008) A library for locally weighted projection regression. J Mach Learn Res
Vijayakumar S, D'Souza A, Schaal S (2005) LWPR: A scalable method for incremental online learning in high dimensions
Ahsan M, Mashuri M, Kuswanto H, Prastyo DD (2018) Intrusion detection system using multivariate control chart Hotelling’s T2 based on PCA. Int J Adv Sci Eng Inf Technol 8(5):1905–1911
Amor LB, Lahyani I, Jmaiel M (2017) PCA-based multivariate anomaly detection in mobile healthcare applications. In: 2017 IEEE/ACM 21st International symposium on distributed simulation and real time applications (DS-RT), IEEE, pp 1–8
Scheibler R (2020) Generalized minimal distortion principle for blind source separation. arXiv preprint arXiv:2009.05288
Lv Z, Zhang BB, Wu XP, Zhang C, Zhou BY (2017) A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation. Speech Commun 92:132–141
Nakatani T, Yoshioka T, Kinoshita K, Miyoshi M, Juang BH (2010) Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans Audio Speech Lang Process 18(7):1717–1731
Kinoshita K, Delcroix M, Kwon H, Mori T, Nakatani T (2017) Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. In: Interspeech, pp 384–388
Mowlaee P, Saeidi R, Christensen MG, Martin R (2012) Subjective and objective quality assessment of single-channel speech separation algorithms. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 69–72
Looney D, Gaubitch ND (2020) Joint estimation of acoustic parameters from single-microphone speech observations. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 431–435
Dietzen T, Doclo S, Moonen M, van Waterschoot T (2020) Integrated sidelobe cancellation and linear prediction Kalman filter for joint multi-microphone speech dereverberation, interfering speech cancellation, and noise reduction. IEEE ACM Trans Audio Speech Lang Process 28:740–754
Ibarrola FJ, Di Persia LE, Spies RD (2018) A Bayesian approach to convolutive nonnegative matrix factorization for blind speech dereverberation. Signal Process 151:89–98
Series B (2014) Method for the subjective assessment of intermediate quality level of audio systems. Int Telecommun Union Radiocommun Assembly
Min X, Zhai G, Zhou J, Farias MC, Bovik AC (2020) Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans Image Process 29:6054–6068
Huber R, Kollmeier B (2006) PEMO-Q—A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans Audio Speech Lang Process 14(6):1902–1911
Su J, Jin Z, Finkelstein A (2020) HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks. arXiv preprint arXiv:2006.05694
Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057
Prodeus A, Kotvytskyi I (2017) On reliability of log-spectral distortion measure in speech quality estimation. In: 2017 IEEE 4th International conference actual problems of unmanned aerial vehicles developments (APUAVD), pp 121–124, IEEE
Ernst O, Chazan SE, Gannot S, Goldberger J (2018) Speech dereverberation using fully convolutional networks. In: 2018 26th European Signal Processing Conference (EUSIPCO), pp 390–394, IEEE
Nathwani K, Hegde RM (2015) Joint source separation and dereverberation using constrained spectral divergence optimization. Signal Process 106:266–281
Fu Y, Wu J, Hu Y, Xing M, Xie L (2021, January) DESNet: A multi-channel network for simultaneous speech dereverberation, enhancement and separation. In: 2021 IEEE spoken language technology workshop (SLT) pp 857–864, IEEE
Sivasankaran S, Vincent E, Illina I (2017) A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions. Comput Speech Lang 46:444–460
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sheeja, J.J.C., Sankaragomathi, B. Speech dereverberation and source separation using DNN-WPE and LWPR-PCA. Neural Comput & Applic 35, 7339–7356 (2023). https://doi.org/10.1007/s00521-022-07884-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07884-0