Abstract
Due to the powerful feature extraction ability, deep learning has become a new trend towards solving speech separation problems. In this paper, we present a novel Deep Neural Network (DNN) architecture for monaural speech separation. Taking into account the good mask property of the human auditory system, a perceptual modified Wiener filtering masking function is applied in the proposed DNN architecture, which is used to make the residual noise perceptually inaudible. The proposed architecture jointly optimize the perceptual modified Wiener filtering mask and DNN. Evaluation experiments on TIMIT database with 20 noise types at different signal-to-noise ratio (SNR) situations demonstrate the superiority of the proposed method over the reference DNN-based separation methods, no matter whether the noise appeared in the training database or not.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Paliwal, K., Wjcicki, K., Schwerin, B.: Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)
Lim, J.S., Oppenheim, A.V.: Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)
Gerkmann, T., Hendriks, R.C.: Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)
Cohen, I.: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003)
Sun, M., Li, Y.N., Gemmeke, J., Zhang, X.W.: Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE/ACM Trans. Audio Speech Lang. Process. 23(7), 1233–1242 (2015)
Mohammadiha, N., Smaragdis, P., Leijon, A.: Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Wang, Y.X., Narayanan, A., Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)
Huang, P.S., Kim, M., Johnson, M.H.: Joint optimization of masks and deep recurrent neural network for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)
Williamson, D.S., Wang, Y.X., Wang, D.L.: Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 483–492 (2016)
Sun, M., Zhang, X.W., Hamme, H.V., Zheng, T.F.: Unseen noise estimation using separable deep auto encoder for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 93–104 (2016)
Xia, B.Y., Bao, C.C.: Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun. 60(2), 13–29 (2014)
Alam, M.J., O’Shaughnessy, D., Selouani, S.A.: Speech enhancement based on novel two-step a priori SNR estimators. In: INTERSPEECH, pp. 565–568 (2008)
Hu, Y., Loizou, P.C.: Incorporating a psychoacoustical model in frequency domain speech enhancement. IEEE Signal Process. Lett. 11(2), 270–273 (2004)
Lin, L., Holmes, W.H., Ambikairajah, E.: Speech denoising using perceptual modification of Wiener filtering. IEE Electron. Lett. 38(23), 1486–1487 (2002)
Amehraye, A., Pastor, D., Tamtaoui, A.: Perceptual improvement of Wiener filtering. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2081–2084 (2008)
Acknowledgments
This work is supported by NSF of China (Grant No. 61471394, 61402519) and NSF of Jiangsu Province (Grant No. BK20140071, BK20140074).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Han, W., Zhang, X., Yang, J., Sun, M., Min, G. (2016). Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-48896-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)