A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation

Routray, Sidheswar; Mao, Qirong

doi:10.1007/s00521-022-06968-1

A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation

Original Article
Published: 19 February 2022

Volume 34, pages 9831–9845, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

420 Accesses
5 Citations
4 Altmetric
Explore all metrics

Abstract

Generally, the recorded speech signal is corrupted by both room reverberation and background noise leading to a reduced speech quality and intelligibility. In order to deal with the distortions caused by the joint effect of noise and reverberation, we propose a context-aware-based deep neural network (DNN) approach for simultaneous speech denoising and dereverberation. The proposed system consists of two stages such as denoising stage and the dereverberation stage. In the denoising stage, the additive noise is suppressed by estimating a phase-sensitive mask using DNN. Then, the noise-free reverberant speech is processed through the dereverberation stage. In the dereverberation stage, a reverberation-time-aware DNN-based model is used to perform dereverberation by adopting two reverberation time-dependent parameters such as frameshift size and acoustic context size to get the benefits of the characteristics of the superposition and frame-wise temporal correlations in different reverberation circumstances. Finally, we integrate both the modules and employ the integrated module for joint training using a multi objective loss function to further optimize both the denoising and dereverberation stages. Experimental results show that the proposed approach has shown significant performance improvements over prevalent benchmark dereverberation algorithms on IEEE corpus, REVERB challenge, and TIMIT corpus datasets under several reverberation circumstances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

References

Doire CSJ, Brookes M, Naylor PA, Hicks CM, Betts D, Dmour MA, Holdt-Jensen S (2017) Single-channel online enhancement of speech corrupted by reverberation and noise. IEEE/ACM Trans Audio Speech Lang Process 25(3):572–587
Article Google Scholar
Williamson DS, Wang D (2017) Time-frequency masking in the complex domain for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 25(7):1492–1501
Article Google Scholar
Nakatani T, Ikeshita R., Kinoshita K, Sawada H, Araki S (2021) Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation. In: ICASSP 2021 - 2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6129–6133, https://doi.org/10.1109/ICASSP39728.2021.9414264
Nakatani T, Boeddeker C, Kinoshita K, Ikeshita R, Delcroix M, Haeb-Umbach R (2020) Jointly optimal denoising, dereverberation, and source separation. IEEE/ACM Trans Audio Speech Lang Process 28:2267–2282. https://doi.org/10.1109/TASLP.2020.3013118
Article Google Scholar
Baby D, Bourlard H (2021) Speech dereverberation using variational autoencoders. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5784–5788, https://doi.org/10.1109/ICASSP39728.2021.9414736
Wu M, Wang D (2006) A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Trans Audio Speech Lang Process 14(3):774–784
Article Google Scholar
Parchami M, Amindavar H, Zhu W (2019) Speech reverberation suppression for time-varying environments using weighted prediction error method with time-varying autoregressive model. Speech Commun 109:1–14. https://doi.org/10.1016/j.specom.2019.03.002
Article Google Scholar
Delcroix M, Yoshioka T, Ogawa A, Kubo Y, Fujimoto M, Ito N, Kinoshita K, Espi M, Hori T, Nakatani T, Nakamura A (2014) Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the reverb challenge, In: Proceedings of the REVERB challenge workshop, vol 1, pp 1–8
Schwartz B, Gannot S, Habets EAP (2015) Online speech dereverberation using Kalman filter and EM algorithm. IEEE/ACM Trans Audio Speech Lang Process 23(2):394–406
Article Google Scholar
Cohen A, Stemmer G, Ingalsuo S, Markovich-Golan S (2017) Combined weighted prediction error and minimum variance distortionless response for dereverberation. In: IEEE international conference on acoustics, speech and signal processing, pp 446–450
Weninger F, Geiger J, Wollmer M, Schuller B, Rigoll G (2014) Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments. Comput Speech Lang 28(4):888–902
Article Google Scholar
Han K, Wang Y, Wang D, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992
Article Google Scholar
Xiao X, Zhao S, Nguyen DHH, Zhong X, Jones DL, Chng ES, Li H (2016) Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP J Adv Signal Process 2016(1):4
Article Google Scholar
Wu B, Li K, Yang M, Lee C-H (2017) A reverberation-time aware approach to speech dereverberation based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 25(1):102–111
Article Google Scholar
Zhao Y, Wang Z-Q, Wang DL (2017) A two-stage algorithm for noisy and reverberant speech enhancement. In: Proceedings of ICASSP, pp 5580–5584
Raikar A, Basu S, Hegde RM (2018) Single channel joint speech dereverberation and denoising using deep priors. In: 2018 IEEE global conference on signal and information processing (GlobalSIP). IEEE, pp 216–220
Wang Z-Q, Wang D (2020) Deep learning based target cancellation for speech dereverberation. IEEE/ACM Trans Audio Speech Lang Process 28:941–950. https://doi.org/10.1109/TASLP.2020.2975902
Article Google Scholar
Hussain T, Siniscalchi SM, Wang H-LS, Tsao Y, Salerno VM, Liao W-H (2020) Ensemble hierarchical extreme learning machine for speech dereverberation. IEEE Trans Cognit Dev Syst 12(4):744–758. https://doi.org/10.1109/TCDS.2019.2953620
Article Google Scholar
Chen H, Zhang P (2021) A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation. Neural Netw 141:238–248. https://doi.org/10.1016/j.neunet.2021.04.023
Article Google Scholar
Albuquerque RQ, Mello CAB (2021) Automatic no-reference speech quality assessment with convolutional neural networks. Neural Comput Appl 33(16):9993–10003
Article Google Scholar
Routray S, Mao Q (2022) Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network. Comput Speech Lang 71:101270. https://doi.org/10.1016/j.csl.2021.101270
Article Google Scholar
Kanda N et al. (2019) Guided source separation meets a strong asr backend: Hitachi/Paderborn university joint investigation for dinner party ASR. In: Proceedings of the Interspeech, pp 1248–1252
Haeb-Umbach R et al (2019) Speech processing for digital home assistants. IEEE Signal Process Mag 36(6):111–124
Article Google Scholar
Togami M (2015) Multichannel online speech dereverberation under noisy environments. In: Proceedings of the 23rd European conference on signal processing, pp 1078–1082
Braun S, Habets EAP (2018) Linear prediction based online dereverberation and noise reduction using alternating Kalman filters. IEEE/ACM Trans Audio Speech Lang Process 26(6):1119–1129
Article Google Scholar
Dietzen T, Doclo S, Moonen M, van Waterschoot T (2018) Joint multi-microphone speech dereverberation and noise reduction using integrated sidelobe cancellation and linear prediction. In: Proceedings of the 6th international workshop on acoustic signal enhancement, pp 221–225
Mohammadiha N, Smaragdis P, Doclo S (2015) Joint acoustic and spectral modeling for speech dereverberation using non-negative representations. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4410–4414. IEEE
Wang Y, Narayanan A, Wang DL (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858
Article Google Scholar
Wang D, Chen J (2018) Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans Audio Speech Lang Process 26(10):1702–1726
Article Google Scholar
Shao Y, Srinivasan S, Wang DL (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: Proceedings of ICASSP, pp 1589–1592
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Proc 2:578–589
Article Google Scholar
Rothauser EH et al (1969) IEEE recommended practice for speech quality measurements. IEEE Trans Audio Electroacoust 17:225–246
Article Google Scholar
Habets E (2010) Room impulse response generator (http://home.tiscali.nl/ehabets/rir generator.html)
Allen JB, Berkley DA (1979) Image method for efficiently simulating small room acoustics. J Acoust Soc Am 65:943–950
Article Google Scholar
Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251
Article Google Scholar
Kinoshita K, Delcroix M, Gannot S, Habets E, Haeb-Umbach R, Kellermann W, Leutnant V, Maas R, Nakatani T, Raj B, Sehr A, Yoshioka T (2016) A summary of the reverb challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J Adv Signal Process 7:1–19
Google Scholar
Robinson T, Fransen J, Pye D, Foote J, Renals S (1995) WSJCAMO: a british english speech corpus for large vocabulary continuous speech recognition. In: International conference on acoustics, speech, and signal processing (ICASSP), pp 81–84
Lincoln M, McCowan I, Vepa J, Maganti HK (2005) The multichannel wall street journal audio visual corpus (MC-WSJ-AV): specification and initial experiments. In: IEEE workshop on automatic speech recognition and understanding, pp 357–362
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1, NASA STI/Recon technical report n, vol 93
Hu G (2019) 100 nonspeech sounds 2006 [oneline], Technical Report. Available online: http://web.cse.ohiostate.edu/pnl/corpus/HuNonspeech/HuCorpus.html (accessed on 22 February 2019), Tech. Rep
Rix A W, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2, pp 749–752
Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136
Article Google Scholar
Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Article Google Scholar
Nakatani Tomohiro, Yoshioka Takuya, Kinoshita Keisuke, Miyoshi Masato, Juang Biing-Hwang (2010) Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans Audio Speech Lang Process 18(7):1717–1731
Article Google Scholar
Mack Wolfgang, Chakrabarty Soumitro, Stoter Fabian-Robert, Braun Sebastian, Edler Bernd, Habets Emanuel (2018) Single-channel dereverberation using direct mmse optimization and bidirectional lstm networks. Proc Interspeech 2018:1314–1318
Article Google Scholar
Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5069–5073
Han K, Wang Y, Wang DL, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992
Article Google Scholar
Fan C, Tao J, Liu B, Yi J, Wen Z (2020) Joint Training for simultaneous speech denoising and dereverberation with deep embedding representations, INTERSPEECH
Nakatani T et al (2020) DNN-supported mask-based convolutional beamforming for simultaneous denoising, dereverberation, and source separation. In: ICASSP 2020–2020 ieee international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, pp 6399–6403
Jeub M, Schafer M, Vary P (2009) A binaural room impulse response database for the evaluation of dereverberation algorithms. In: Proceedings of the international conference on digital signal processing, pp 1–5
Zhao Y, Wang D, Xu B, Zhang T (2020) Monaural speech dereverberation using temporal convolutional networks with self attention. IEEE/ACM Trans Audio Speech Lang Process 28:1598–1607
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Key Projects of the National Natural Science Foundation of China under Grant U1836220, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant 2019K222, the Qing Lan Talent Program of Jiangsu Province, the Jiangsu Province Key Research and Development Plan under Grant BE2020036.

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, People’s Republic of China
Sidheswar Routray & Qirong Mao
Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agriculture Applications, Zhenjiang, 212013, People’s Republic of China
Qirong Mao

Authors

Sidheswar Routray
View author publications
You can also search for this author in PubMed Google Scholar
Qirong Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qirong Mao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Routray, S., Mao, Q. A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation. Neural Comput & Applic 34, 9831–9845 (2022). https://doi.org/10.1007/s00521-022-06968-1

Download citation

Received: 10 April 2021
Accepted: 13 January 2022
Published: 19 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00521-022-06968-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation