Skip to main content
Log in

Phase and reverberation aware DNN for distant-talking speech enhancement

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Enhancing reverberant speech with Deep Neural Networks (DNNs) is an interesting yet challenging topic. The performance of speech enhancement degrades significantly when test and training conditions are mismatched. In this paper we propose a Static Reverberation Aware Training (SRAT)-based dereverberation through which the reverberation estimate is obtained by averaging over broken down frame. This method significantly reduces the input dimensions of the and enables the DNN to learn the relations between clean and reverberant speech more efficiently. Most speech enhancement approaches ignore phase information due to its complicated structure. As phase correlates closely to speech signal we exploited this relationship to achieve better performance using DNN. Phase information was augmented with magnitude information and used as the input for DNN. We denote this method as phase aware DNN. Finally, both phase information and reverberation were added to reverberant speech to achieve better speech enhancement performance in a distant-talking condition. Features of the reverberant speech, phase and reverberation were used during the training and testing stages. This is because the DNN could use both reverberation and phase information to better generalize the speech signal. The proposed method was evaluated using the REVERB CHALLENGE 2014 database. Results are significantly improved results with respect to both reconstructed speech quality (PESQ: Perceptual Evaluation of Speech Quality) and influence of reverberation (SRMR: Speech to Reverberation Modulation Energy Ratio). As compared to the conventional DNN-based approach, this proposed one improved SRMR from 4.84 to 5.92 and PESQ from 2.34 to 2.70, indicating that our proposed method could efficiently enhance speech severely corrupted by reverberation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Benesty J, Makino S, Chen J (2005) Speech enhancement. Springer, New York

    Google Scholar 

  2. Boll S (1984) Suppression of acoustics noise in speech using spectral subtraction. IEEE Trans on Acoustics, Speech, Signal Processing 32:1109–1121

    Article  Google Scholar 

  3. Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans on Acoustics, Speech and Signal Processing 33(2):443–445

    Article  Google Scholar 

  4. Ephraim Y et al (1995) A signal subspace approach for speech enhancement. IEEE Trans on Speech and Audio Processing 3(4):251–266

    Article  Google Scholar 

  5. Hegde RM et al (2007) Significance of the Modified Group Delay Feature in Speech Recognition. IEEE Trans on Audio, Speech, and Language Processing 15(1):190–202

    Article  Google Scholar 

  6. Hinton GE et al (2006) A fast learning algorithm for deep belief Networks. Neural Comput 18:1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  7. Kanagasundaram A, Dean D, Sridharan S (2012) JFA based speaker recognition using delta-phase and MFCC features. In: Proc. of SST, pp. 9-12

  8. Kinoshita K, Nakatani T (2011) Speech dereverberation using linear prediction. NTT Technical Review 9(7):1–7

  9. Kinoshita K, Delcroix M, Nakatani T, Miyoshi M (2009) Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple–Step Linear Prediction. IEEE trans Audio, Speech and Language Processing 17(4):534–545

    Article  Google Scholar 

  10. Kinoshita K et al (2013) The reverb challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. Proc. of IEEE Workshop on Application of Signal Processing to Audio Acoustics

  11. Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Proc. of Interspeech, pp. 436-440

  12. Miao Y et al (2015) Distant aware training for robust speech recognition. Proc. of Interspeech, pp. 761-765

  13. Nakagawa S et al (2012) Speaker Identification and Verification by Combining MFCC and Phase Information. IEEE Trans on Audio, Speech and Language Processing 20(4):1085–1095

    Article  Google Scholar 

  14. Nakatani T, Yoshioka T, Kinoshita K, Miyoshi M, Juang BH (2008) Blind speech dereverberation with multi- channel linear prediction based on short time fourier representation. Proc. of ICASSP, Las Vegas, pp 85–88

    Google Scholar 

  15. Oo Z, Wang L, Masahiro I (2015) Investigation of DNN based Distant-Talking Speech Enhancement. Proc of 109th Spoken Language Research Workshop of IEICE 115(346):37–42

    Google Scholar 

  16. Robinson T, Fransen J, Pye D, Foote J, Renals S (1995) WSJCAM0: a british english speech corpus for large vocabulary continuous speech recognition. Proc. of ICASSP, Detroit, pp 81–84

    Google Scholar 

  17. Seltzer ML, Wang Y (2013) An investigation of deep neural networks for noise robust speech recognition. Proc. of ICASSP, Vancouver, pp 7398–7402

    Google Scholar 

  18. Tchorz J, Kollmeier B (2003) SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Trans on Speech Audio Process 11(3):184–192

    Article  Google Scholar 

  19. Ueda Y, Wang L, Kai A, Ren B (2015) Environmental dependent denoising autoencoder for distant talking speech recognition. Eurasip Journal on Advances in Signal Processing 2015(92):1–11

    Google Scholar 

  20. Wan EA, Nelson T (1998) Handbook of neural network for speech processing. Artech House, Boston

    Google Scholar 

  21. Wang L et al (2010) Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Trans Inf Syst E93-D 9:2397–2406

  22. Wang L et al (2015) Relative phase information for detection human speech and spoofed speech. In: Proc. of Interspeech, pp. 2092-2096

  23. Xiao X et al (2014) The NTU–ADSC system for reverberation challenge 2014. Proc of Reverb Workshop

  24. Xiao X et al (2016) Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal of Advances in Signal Processing 2016(4):1–18

    Google Scholar 

  25. Xu Y, Du J, Dai L, Lee C (2014) An Experimental Study on Speech on Deep Neural Networks. IEEE Signal Processing Letter 21(1):65–68

    Article  Google Scholar 

  26. Xu Y, Du J, Dai L, Lee C (2014) Dynamic noise aware training for speech enhancement based on deep neural networks. Proc. of Interspeech, pp. 2670–2674

  27. Xu Y, Du J, Dai L, Lee C (2015) “A Regression Approach to Speech Enhancement Based on Deep Neural Networks” IEEE Trans on Audio. Speech and Language Processing 23:7–19

    Google Scholar 

Download references

Acknowledgements

The research was supported partially by the National Natural Science Foundation of China (No. 61771333 and No. U1736219) and JSPS KAKENHI Grant (No. 16K12461 and No. 16K00297).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longbiao Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oo, Z., Wang, L., Phapatanaburi, K. et al. Phase and reverberation aware DNN for distant-talking speech enhancement. Multimed Tools Appl 77, 18865–18880 (2018). https://doi.org/10.1007/s11042-018-5686-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5686-1

Keywords

Navigation