Abstract
Single-channel speech separation (SCSS) plays an important role in speech processing. It is an underdetermined problem since several signals need to be recovered from one channel, which is more difficult to solve. To achieve SCSS more effectively, we propose a new cost function. What’s more, a joint constraint algorithm based on this function is used to separate mixed speech signals, which aims to separate two sources at the same time accurately. The joint constraint algorithm not only penalizes residual sum of square, but also exploits the joint relationship between the outputs to train the dual output DNN. In these joint constraints, the training accuracy of the separation model can be further increased. We evaluate the proposed algorithm performance on the GRID corpus. The experimental results show that the new algorithm can obtain better speech intelligibility compared to the basic cost function. In the aspects of source-to-distortion ratio , signal-to-interference ratio, source-to-artifact ratio and perceptual evaluation of speech quality, the novel approach can obtain better performance.
Similar content being viewed by others
References
Du, J., Tu, Y., Dai, L., Lee, C.: A regression approach to single-channel speech separation via high-resolution deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1424–1437 (2017)
Corey, R. M., Singer, A. C.: Dynamic range compression for noisy mixtures using source separation and beamforming. In: Proc. IEEE Workshop Appl. Signal Processing Audio and Acoustics, pp. 289–293. New Paltz, NY, USA (2017)
Chang, J., Wang, D.: Robust speaker recognition based on DNN/i-Vectors and speech separation. In: Proceedings of IEEE International Conference Acoustics Speech Signal Processings, pp. 5415–5419. New Orleans, LA, USA (2017)
Narayanan, A., Wang, D.L.: Investigation of speech separation as a front-end for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 826–835 (2014)
Zhang, X.L., Wang, D.L.: A deep ensemble learning method for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(5), 967–977 (2016)
Han, K., Wang, Y., Wang, D.L., et al.: Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992 (2015)
Sun, Y., Wang, W., Chambers, J., et al.: Two-stage monaural source separation in reverberant room environments using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 125–139 (2019)
Tu, Y., Du, J., Xu, Y., Dai, L., et al: Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. In: Proc. Int. Symp. Chin. Spoken Lang. Process., Singapore, pp. 250–254. Singapore (2014)
Huang, P.-S., Kim, M., Hasegawa-Johnson, M., et al.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)
Wang, Y., Du, J., Dai, L.R., et al.: A gender mixture detection approach to unsupervised single-channel speech separation based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1535–1546 (2017)
Zhang, X., Wang, D.L.: Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)
Grais, E.M., Roma, G., Simpson, A.J.R., et al.: Two stage single channel audio source separation using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(9), 1773–1783 (2017)
Naithani, G., Nikunen, J., Bramsløw, L., et al: Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications. In: Proc. IEEE Workshop Acoustic Signal Enhancement, pp. 386–390. Tokyo, Japan (2018)
Weninger, F., Eyben, F., Schuller, B.: Single-channel speech separation with memory-enhanced recurrent neural networks. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3709–3713. Florence, Italy (2014)
Joho, M., Lambert, R. H., Mathis, H.: Elementary cost functions for blind separation of non-stationary source signals. In: Acoustics, Speech, & Signal Processing, vol. 5, pp. 2793–2796 Salt Lake City, UT, USA (2001)
Sun, L., Xie, K., Gu, T., et al.: Joint dictionary learning using a new optimization method for single-channel blind source separation. Speech Commun. 106, 85–94 (2019)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Cooke, M., Barker, J., Cunningham, S., et al.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)
Rix, A., Beerends, J., Hollier, M., et al: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proc. ICASSP, pp. 749–752. Salt Lake City, UT, USA (2001)
Vincent, E., Gribonval, R.: Fevotte C 2006 Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Wang, Y., Narayanan, A., Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 61901227, 61671252) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB510049).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, L., Zhu, G. & Li, P. Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation. SIViP 14, 1387–1395 (2020). https://doi.org/10.1007/s11760-020-01676-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01676-6