Skip to main content

Advertisement

Log in

Speech dereverberation and source separation using DNN-WPE and LWPR-PCA

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Speech signals observed from distantly placed microphones may have some acoustic interference, such as noise and reverberation. These may lead to the degradation of the quality of blind speech. Hence, it is necessary to process the acquired speech signals to separate the blind source and eliminate the reverberation. Therefore, we proposed a novel speech separation and dereverberation method, which is based on the incorporation of Locally Weighted Projection Regression (LWPR)-based Principal Component Analysis (PCA) and Deep Neural Network (DNN)-based Weighted Prediction Error (WPE). The proposed method preprocesses the mixed reverberant signal prior to the application of Blind Source Separation (BSS) and Blind Dereverberation (BD). The preprocessing of the input sample signals is performed with the exploitation of fast Fourier transform (FFT) and whitening approaches to convert the time domain signal into frequency domain signal and to generate the transformation matrices. Besides, the utilization of LWPR-PCA can perform the BSS and the DNN-WPE can be used to conduct the BD. Moreover, the experimental analysis of our proposed method is compared with the existing RPCA-SNMF, CBF, BA-CNMF, AFMNMF, and ISC-LPKF approaches. The experimental outcomes depict that the proposed method effectively separates the original signal from the mixed reverberant signals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Availability of data and material

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

  1. Herzog A, Habets EA (2020) Direction and reverberation preserving noise reduction of ambisonics signals. IEEE/ACM Trans Audio Speech Lang Process 28:2461–2475

    Article  Google Scholar 

  2. Xiao Y, Lu W, Yan Q, Zhang H (2021) Blind separation of coherent multipath signals with impulsive interference and Gaussian noise in time-frequency domain. Signal Process 178:107750

    Article  Google Scholar 

  3. Gultepe E, Makrehchi M (2018) Improving clustering performance using independent component analysis and unsupervised feature learning. HCIS 8(1):1–19

    Google Scholar 

  4. Sunohara M, Haruta C, Ono N (2017) March. Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), (pp 216–220). IEEE

  5. Grezes F, Ni Z, Trinh VA, Mandel M (2020) Enhancement of spatial clustering-based time-frequency masks using LSTM neural networks. arXiv preprint arXiv:2012.01576

  6. Parchami M, Zhu WP, Champagne B (2017) Model-based estimation of late reverberant spectral variance using modified weighted prediction error method. Speech Commun 92:100–113

    Article  Google Scholar 

  7. Boeddeker C, Nakatani T, Kinoshita K, Haeb-Umbach R (2020) Jointly optimal dereverberation and beamforming. In: ICASSP 2020–2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 216–220

  8. Greenewald K, Hero AO (2015) Robust kronecker product PCA for spatio-temporal covariance estimation. IEEE Trans Signal Process 63(23):6368–6378

    Article  MathSciNet  MATH  Google Scholar 

  9. Khosravy M, Gupta N, Dey N, Crespo RG (2022) Underwater IoT network by blind MIMO OFDM transceiver based on probabilistic Stone’s blind source separation. ACM Trans Sensor Netw (TOSN) 18(3):1–27

    Article  Google Scholar 

  10. Li C, Zhu L, Luo Z, Zhang Z, Yang Y (2022) Effective methods and performance analysis on data transmission security with blind source separation in space-based AIS. China Commun 19(4):154–165

    Article  Google Scholar 

  11. Ma B, Zhang T (2019) An analysis approach for multivariate vibration signals integrate HIWO/BBO optimized blind source separation with NA-MEMD. IEEE Access 7:87233–87245

    Article  Google Scholar 

  12. Jia Y, Xu P (2020) Convolutive blind source separation for communication signals based on the sliding Z-transform. IEEE Access 8:41213–41219

    Article  Google Scholar 

  13. Zhang Z, Gao H, Ma J, Wang S, Sun H (2021) Blind source separation based on quantum slime mould algorithm in impulse noise. Math Problems Eng 2021:1–17

    Google Scholar 

  14. Wu B, Li K, Huang Z, Siniscalchi SM, Yang M, Lee CH (2017, March) A unified deep modeling approach to simultaneous speech dereverberation and recognition for the REVERB challenge. In: 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), pp 36–40, IEEE

  15. Lee I, Kim T, Lee TW (2007) Fast fixed-point independent vector analysis algorithms for convolutive blind source separation. Signal Process 87(8):1859–1871

    Article  MATH  Google Scholar 

  16. Do HD, Tran ST, Chau DT (2020) Speech source separation using variational autoencoder and bandpass filter. IEEE Access 8:156219–156231

    Article  Google Scholar 

  17. Nakatani T, Boeddeker C, Kinoshita K, Ikeshita R, Delcroix M, Haeb-Umbach R (2020) Jointly optimal denoising, dereverberation, and source separation. IEEE ACM Trans Audio Speech Lang Process 28:2267–2282

    Article  Google Scholar 

  18. Ullah R, Islam MS, Hossain MI, Wahab FE, Ye Z (2020) Single channel speech dereverberation and separation using RPCA and SNMF. Appl Acoust 167:107406

    Article  Google Scholar 

  19. Song S, Cheng L, Luan S, Yao D, Li J, Yan Y (2021) An integrated multi-channel approach for joint noise reduction and dereverberation. Appl Acoust 171:107526

    Article  Google Scholar 

  20. He R, Long Y, Li Y, Liang J (2020) Mask-based blind source separation and MVDR beamforming in ASR. Int J Speech Technol 23(1):133–140

    Article  Google Scholar 

  21. Tan K, Xu Y, Zhang SX, Yu M, Yu D (2020) Audio-visual speech separation and dereverberation with a two-stage multimodal network. IEEE J Select Topics Signal Process 14(3):542–553

    Article  Google Scholar 

  22. Khan JB, Jan T, Khalil RA, Altalbe A (2020) Hybrid source prior based independent vector analysis for blind separation of speech signals. IEEE Access 8:132871–132881

    Article  Google Scholar 

  23. Nugraha AA, Sekiguchi K, Fontaine M, Bando Y, Yoshii K (2020) Flow-based independent vector analysis for blind source separation. IEEE Signal Process Lett 27:2173–2177

    Article  Google Scholar 

  24. Togami M (2020) Joint training of deep neural networks for multi-channel dereverberation and speech source separation. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3032–3036, IEEE

  25. Nakatani T, Takahashi R, Ochiai T, Kinoshita K, Ikeshita R, Delcroix M, Araki S (2020) DNN-supported mask-based convolutional beamforming for simultaneous denoising, dereverberation, and source separation. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6399–6403, IEEE

  26. Sekiguchi K, Bando Y, Nugraha AA, Fontaine M, Yoshii K (2021) Autoregressive fast multichannel nonnegative matrix factorization for joint blind source separation and dereverberation. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 511–515, IEEE

  27. Sheeja JJ, Sankaragomathi B (2022) CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures. Signal Image Video Process 16:1323–1331

    Article  Google Scholar 

  28. Bulut AE, Koishida K (2020) Low-latency single channel speech dereverberation using U-net convolutional neural networks. In: Interspeech, pp 2442–2446

  29. Tsai TH, Liu PY, Chiou YH (2022) Hardware design for Blind source separation using a fast time-frequency mask technique. Integration 82:67–77

    Article  Google Scholar 

  30. Kumar M, Jayanthi VE (2020) Blind source separation using kurtosis, negentropy and maximum likelihood functions. Int J Speech Technol 23(1):13–21

    Article  Google Scholar 

  31. Huang L, Zhao L, Zhou Y, Zhu F, Liu L, Shao L (2020) An investigation into the stochasticity of batch whitening. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 6439–6448

  32. Klanke S, Vijayakumar S, Schaal S (2008) A library for locally weighted projection regression. J Mach Learn Res

  33. Vijayakumar S, D'Souza A, Schaal S (2005) LWPR: A scalable method for incremental online learning in high dimensions

  34. Ahsan M, Mashuri M, Kuswanto H, Prastyo DD (2018) Intrusion detection system using multivariate control chart Hotelling’s T2 based on PCA. Int J Adv Sci Eng Inf Technol 8(5):1905–1911

    Article  Google Scholar 

  35. Amor LB, Lahyani I, Jmaiel M (2017) PCA-based multivariate anomaly detection in mobile healthcare applications. In: 2017 IEEE/ACM 21st International symposium on distributed simulation and real time applications (DS-RT), IEEE, pp 1–8

  36. Scheibler R (2020) Generalized minimal distortion principle for blind source separation. arXiv preprint arXiv:2009.05288

  37. Lv Z, Zhang BB, Wu XP, Zhang C, Zhou BY (2017) A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation. Speech Commun 92:132–141

    Article  Google Scholar 

  38. Nakatani T, Yoshioka T, Kinoshita K, Miyoshi M, Juang BH (2010) Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans Audio Speech Lang Process 18(7):1717–1731

    Article  Google Scholar 

  39. Kinoshita K, Delcroix M, Kwon H, Mori T, Nakatani T (2017) Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. In: Interspeech, pp 384–388

  40. https://www.kaggle.com/nltkdata/timitcorpus.

  41. Mowlaee P, Saeidi R, Christensen MG, Martin R (2012) Subjective and objective quality assessment of single-channel speech separation algorithms. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 69–72

  42. Looney D, Gaubitch ND (2020) Joint estimation of acoustic parameters from single-microphone speech observations. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 431–435

  43. Dietzen T, Doclo S, Moonen M, van Waterschoot T (2020) Integrated sidelobe cancellation and linear prediction Kalman filter for joint multi-microphone speech dereverberation, interfering speech cancellation, and noise reduction. IEEE ACM Trans Audio Speech Lang Process 28:740–754

    Article  Google Scholar 

  44. Ibarrola FJ, Di Persia LE, Spies RD (2018) A Bayesian approach to convolutive nonnegative matrix factorization for blind speech dereverberation. Signal Process 151:89–98

    Article  Google Scholar 

  45. Series B (2014) Method for the subjective assessment of intermediate quality level of audio systems. Int Telecommun Union Radiocommun Assembly

  46. Min X, Zhai G, Zhou J, Farias MC, Bovik AC (2020) Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans Image Process 29:6054–6068

    Article  MATH  Google Scholar 

  47. Huber R, Kollmeier B (2006) PEMO-Q—A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans Audio Speech Lang Process 14(6):1902–1911

    Article  Google Scholar 

  48. Su J, Jin Z, Finkelstein A (2020) HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks. arXiv preprint arXiv:2006.05694

  49. Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057

    Article  Google Scholar 

  50. Prodeus A, Kotvytskyi I (2017) On reliability of log-spectral distortion measure in speech quality estimation. In: 2017 IEEE 4th International conference actual problems of unmanned aerial vehicles developments (APUAVD), pp 121–124, IEEE

  51. Ernst O, Chazan SE, Gannot S, Goldberger J (2018) Speech dereverberation using fully convolutional networks. In: 2018 26th European Signal Processing Conference (EUSIPCO), pp 390–394, IEEE

  52. Nathwani K, Hegde RM (2015) Joint source separation and dereverberation using constrained spectral divergence optimization. Signal Process 106:266–281

    Article  Google Scholar 

  53. Fu Y, Wu J, Hu Y, Xing M, Xie L (2021, January) DESNet: A multi-channel network for simultaneous speech dereverberation, enhancement and separation. In: 2021 IEEE spoken language technology workshop (SLT) pp 857–864, IEEE

  54. Sivasankaran S, Vincent E, Illina I (2017) A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions. Comput Speech Lang 46:444–460

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jasmine J. C. Sheeja.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheeja, J.J.C., Sankaragomathi, B. Speech dereverberation and source separation using DNN-WPE and LWPR-PCA. Neural Comput & Applic 35, 7339–7356 (2023). https://doi.org/10.1007/s00521-022-07884-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07884-0

Keywords