Phase and reverberation aware DNN for distant-talking speech enhancement

Oo, Zeyan; Wang, Longbiao; Phapatanaburi, Khomdet; Iwahashi, Masahiro; Nakagawa, Seiichi; Dang, Jianwu

doi:10.1007/s11042-018-5686-1

Phase and reverberation aware DNN for distant-talking speech enhancement

Published: 20 February 2018

Volume 77, pages 18865–18880, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zeyan Oo¹,
Longbiao Wang²,
Khomdet Phapatanaburi¹,
Masahiro Iwahashi¹,
Seiichi Nakagawa³ &
…
Jianwu Dang^2,4

517 Accesses
10 Citations
Explore all metrics

Abstract

Enhancing reverberant speech with Deep Neural Networks (DNNs) is an interesting yet challenging topic. The performance of speech enhancement degrades significantly when test and training conditions are mismatched. In this paper we propose a Static Reverberation Aware Training (SRAT)-based dereverberation through which the reverberation estimate is obtained by averaging over broken down frame. This method significantly reduces the input dimensions of the and enables the DNN to learn the relations between clean and reverberant speech more efficiently. Most speech enhancement approaches ignore phase information due to its complicated structure. As phase correlates closely to speech signal we exploited this relationship to achieve better performance using DNN. Phase information was augmented with magnitude information and used as the input for DNN. We denote this method as phase aware DNN. Finally, both phase information and reverberation were added to reverberant speech to achieve better speech enhancement performance in a distant-talking condition. Features of the reverberant speech, phase and reverberation were used during the training and testing stages. This is because the DNN could use both reverberation and phase information to better generalize the speech signal. The proposed method was evaluated using the REVERB CHALLENGE 2014 database. Results are significantly improved results with respect to both reconstructed speech quality (PESQ: Perceptual Evaluation of Speech Quality) and influence of reverberation (SRMR: Speech to Reverberation Modulation Energy Ratio). As compared to the conventional DNN-based approach, this proposed one improved SRMR from 4.84 to 5.92 and PESQ from 2.34 to 2.70, indicating that our proposed method could efficiently enhance speech severely corrupted by reverberation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement

Article 01 December 2019

N. Saleem, M. I. Khattak & E. V. Perez

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

Article Open access 12 October 2023

Chunxi Wang, Maoshen Jia & Xinfeng Zhang

Low SNR speech enhancement with DNN based phase estimation

Article 23 February 2019

Samba Raju Chiluveru & Manoj Tripathy

References

Benesty J, Makino S, Chen J (2005) Speech enhancement. Springer, New York
Google Scholar
Boll S (1984) Suppression of acoustics noise in speech using spectral subtraction. IEEE Trans on Acoustics, Speech, Signal Processing 32:1109–1121
Article Google Scholar
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans on Acoustics, Speech and Signal Processing 33(2):443–445
Article Google Scholar
Ephraim Y et al (1995) A signal subspace approach for speech enhancement. IEEE Trans on Speech and Audio Processing 3(4):251–266
Article Google Scholar
Hegde RM et al (2007) Significance of the Modified Group Delay Feature in Speech Recognition. IEEE Trans on Audio, Speech, and Language Processing 15(1):190–202
Article Google Scholar
Hinton GE et al (2006) A fast learning algorithm for deep belief Networks. Neural Comput 18:1527–1554
Article MathSciNet MATH Google Scholar
Kanagasundaram A, Dean D, Sridharan S (2012) JFA based speaker recognition using delta-phase and MFCC features. In: Proc. of SST, pp. 9-12
Kinoshita K, Nakatani T (2011) Speech dereverberation using linear prediction. NTT Technical Review 9(7):1–7
Kinoshita K, Delcroix M, Nakatani T, Miyoshi M (2009) Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple–Step Linear Prediction. IEEE trans Audio, Speech and Language Processing 17(4):534–545
Article Google Scholar
Kinoshita K et al (2013) The reverb challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. Proc. of IEEE Workshop on Application of Signal Processing to Audio Acoustics
Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Proc. of Interspeech, pp. 436-440
Miao Y et al (2015) Distant aware training for robust speech recognition. Proc. of Interspeech, pp. 761-765
Nakagawa S et al (2012) Speaker Identification and Verification by Combining MFCC and Phase Information. IEEE Trans on Audio, Speech and Language Processing 20(4):1085–1095
Article Google Scholar
Nakatani T, Yoshioka T, Kinoshita K, Miyoshi M, Juang BH (2008) Blind speech dereverberation with multi- channel linear prediction based on short time fourier representation. Proc. of ICASSP, Las Vegas, pp 85–88
Google Scholar
Oo Z, Wang L, Masahiro I (2015) Investigation of DNN based Distant-Talking Speech Enhancement. Proc of 109th Spoken Language Research Workshop of IEICE 115(346):37–42
Google Scholar
Robinson T, Fransen J, Pye D, Foote J, Renals S (1995) WSJCAM0: a british english speech corpus for large vocabulary continuous speech recognition. Proc. of ICASSP, Detroit, pp 81–84
Google Scholar
Seltzer ML, Wang Y (2013) An investigation of deep neural networks for noise robust speech recognition. Proc. of ICASSP, Vancouver, pp 7398–7402
Google Scholar
Tchorz J, Kollmeier B (2003) SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Trans on Speech Audio Process 11(3):184–192
Article Google Scholar
Ueda Y, Wang L, Kai A, Ren B (2015) Environmental dependent denoising autoencoder for distant talking speech recognition. Eurasip Journal on Advances in Signal Processing 2015(92):1–11
Google Scholar
Wan EA, Nelson T (1998) Handbook of neural network for speech processing. Artech House, Boston
Google Scholar
Wang L et al (2010) Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Trans Inf Syst E93-D 9:2397–2406
Wang L et al (2015) Relative phase information for detection human speech and spoofed speech. In: Proc. of Interspeech, pp. 2092-2096
Xiao X et al (2014) The NTU–ADSC system for reverberation challenge 2014. Proc of Reverb Workshop
Xiao X et al (2016) Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal of Advances in Signal Processing 2016(4):1–18
Google Scholar
Xu Y, Du J, Dai L, Lee C (2014) An Experimental Study on Speech on Deep Neural Networks. IEEE Signal Processing Letter 21(1):65–68
Article Google Scholar
Xu Y, Du J, Dai L, Lee C (2014) Dynamic noise aware training for speech enhancement based on deep neural networks. Proc. of Interspeech, pp. 2670–2674
Xu Y, Du J, Dai L, Lee C (2015) “A Regression Approach to Speech Enhancement Based on Deep Neural Networks” IEEE Trans on Audio. Speech and Language Processing 23:7–19
Google Scholar

Download references

Acknowledgements

The research was supported partially by the National Natural Science Foundation of China (No. 61771333 and No. U1736219) and JSPS KAKENHI Grant (No. 16K12461 and No. 16K00297).

Author information

Authors and Affiliations

Nagaoka University of Technology, Nagaoka, Japan
Zeyan Oo, Khomdet Phapatanaburi & Masahiro Iwahashi
Tianjin key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
Longbiao Wang & Jianwu Dang
Toyohashi University of Technology, Toyohashi, Japan
Seiichi Nakagawa
Intelligent Spoken Language Technology (Tianjin) Co. Ltd., Tianjin, China
Jianwu Dang

Authors

Zeyan Oo
View author publications
You can also search for this author in PubMed Google Scholar
Longbiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Khomdet Phapatanaburi
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Iwahashi
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Dang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Longbiao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oo, Z., Wang, L., Phapatanaburi, K. et al. Phase and reverberation aware DNN for distant-talking speech enhancement. Multimed Tools Appl 77, 18865–18880 (2018). https://doi.org/10.1007/s11042-018-5686-1

Download citation

Received: 23 October 2016
Revised: 13 August 2017
Accepted: 18 January 2018
Published: 20 February 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11042-018-5686-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phase and reverberation aware DNN for distant-talking speech enhancement

Abstract

Access this article

Similar content being viewed by others

Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

Low SNR speech enhancement with DNN based phase estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Phase and reverberation aware DNN for distant-talking speech enhancement

Abstract

Access this article

Similar content being viewed by others

Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

Low SNR speech enhancement with DNN based phase estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation