Deep neural network based speech enhancement using mono channel mask

Ingale, Pallavi P.; Nalbalwar, Sanjay L.

doi:10.1007/s10772-019-09627-4

Deep neural network based speech enhancement using mono channel mask

Published: 29 August 2019

Volume 22, pages 841–850, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

206 Accesses
4 Citations
Explore all metrics

Abstract

Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Article 06 October 2021

Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

Article Open access 02 April 2019

Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

Article 01 August 2023

References

Barfuss, H., Huemmer, C., Schwarz, A., & Kellermann, W. (2017). Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments. Computer Speech & Language, 46, 388–400.
Article Google Scholar
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. Neural networks: Tricks of the trade (pp. 437–478). Berlin: Springer.
Chapter Google Scholar
Chehrehsa, S., & Moir, T. J. (2016). Speech enhancement using maximum a-posteriori and gaussian mixture models for speech and noise periodogram estimation. Computer Speech & Language, 36, 58–71.
Article Google Scholar
Delfarah, M., & Wang, D. (2017). Features for masking-based monaural speech separation in reverberant conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5), 1085–1094.
Article Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Févotte, C., Gribonval, R., & Vincent, E. (2005). Bss_eval toolbox user guide-revision 2.0.
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon technical report n, 93.
Han, K., & Wang, D. (2012). A classification based approach to speech segregation. The Journal of the Acoustical Society of America, 132(5), 3475–3483.
Article Google Scholar
Hasan, M. K., Salahuddin, S., & Khan, M. R. (2004). A modified a priori snr for speech enhancement using spectral subtraction rules. IEEE Signal Processing Letters, 11(4), 450–453.
Article Google Scholar
Hu, G., & Wang, D. (2006). An auditory scene analysis approach to monaural speech segregation. Topics in acoustic echo and noise control (pp. 485–515). Berlin: Springer.
Google Scholar
Hu, G., & Wang, D. (2007). Auditory segmentation based on onset and offset analysis. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 396–405.
Article Google Scholar
Ingale, P. P., & Nalbalwar, S. L. (2018). Singing voice separation using mono-channel mask. International Journal of Speech Technology, 21(2), 309–318.
Article Google Scholar
Islam, M. T., Shahnaz, C., Zhu, W.-P., & Ahmad, M. O. (2015). Speech enhancement based on student t modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(11), 1800–1811.
Article Google Scholar
Kang, T. G., Shin, J. W., & Kim, N. S. (2018). Dnn-based monaural speech enhancement with temporal and spectral variations equalization. Digital Signal Processing, 74, 102–110.
Article Google Scholar
Kim, G., Lu, Y., Hu, Y., & Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America, 126(3), 1486–1494.
Article Google Scholar
Lu, Y., & Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50(6), 453–466.
Article Google Scholar
Mohammadiha, N., Taghia, J., & Leijon, A. (2012). Single channel speech enhancement using bayesian nmf with recursive temporal updates of prior distributions. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2012 (pp. 4561–4564). IEEE.
Polikar, R. (1996). The wavelet tutorial.
Recommendation, I.-T. (2001). Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862.
Tseng, H.-W., Hong, M., & Luo, Z.-Q. (2015). Combining sparse nmf with deep neural network: A new classification-based approach for speech enhancement. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015 (pp. 2145–2149). IEEE.
Wang, D. (2005). On ideal binary mask as the computational goal of auditory scene analysis. Speech separation by humans and machines (pp. 181–197). New York: Springer.
Chapter Google Scholar
Wang, Y., Narayanan, A., & Wang, D. (2014). On training targets for supervised speech separation. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12), 1849–1858.
Article Google Scholar
Wang, Y., & Wang, D. (2014). A structure-preserving training target for supervised speech separation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2014 (pp. 6107–6111). IEEE.
Wang, Z., Sha, F. (2014). Discriminative non-negative matrix factorization for single-channel speech separation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2014 (pp. 3749–3753). IEEE.
Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). Speech denoising using nonnegative matrix factorization with priors. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP (2008) (pp. 4029–4032). IEEE.
Yu, W., Jiajun, L., Ning, C., & Wenhao, Y. (2013). Improved monaural speech segregation based on computational auditory scene analysis. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 2.
Article Google Scholar
Zhao, X., Wang, Y., & Wang, D. (2014). Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(4), 836–845.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dr. Babasaheb Ambedkar Tecnhological University, Lonere, India
Pallavi P. Ingale & Sanjay L. Nalbalwar

Authors

Pallavi P. Ingale
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay L. Nalbalwar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pallavi P. Ingale.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ingale, P.P., Nalbalwar, S.L. Deep neural network based speech enhancement using mono channel mask. Int J Speech Technol 22, 841–850 (2019). https://doi.org/10.1007/s10772-019-09627-4

Download citation

Received: 25 January 2019
Accepted: 23 August 2019
Published: 29 August 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10772-019-09627-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep neural network based speech enhancement using mono channel mask

Abstract

Access this article

Similar content being viewed by others

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep neural network based speech enhancement using mono channel mask

Abstract

Access this article

Similar content being viewed by others

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation