Skip to main content
Log in

Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Single-channel speech separation (SCSS) plays an important role in speech processing. It is an underdetermined problem since several signals need to be recovered from one channel, which is more difficult to solve. To achieve SCSS more effectively, we propose a new cost function. What’s more, a joint constraint algorithm based on this function is used to separate mixed speech signals, which aims to separate two sources at the same time accurately. The joint constraint algorithm not only penalizes residual sum of square, but also exploits the joint relationship between the outputs to train the dual output DNN. In these joint constraints, the training accuracy of the separation model can be further increased. We evaluate the proposed algorithm performance on the GRID corpus. The experimental results show that the new algorithm can obtain better speech intelligibility compared to the basic cost function. In the aspects of source-to-distortion ratio , signal-to-interference ratio, source-to-artifact ratio and perceptual evaluation of speech quality, the novel approach can obtain better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Du, J., Tu, Y., Dai, L., Lee, C.: A regression approach to single-channel speech separation via high-resolution deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1424–1437 (2017)

    Article  Google Scholar 

  2. Corey, R. M., Singer, A. C.: Dynamic range compression for noisy mixtures using source separation and beamforming. In: Proc. IEEE Workshop Appl. Signal Processing Audio and Acoustics, pp. 289–293. New Paltz, NY, USA (2017)

  3. Chang, J., Wang, D.: Robust speaker recognition based on DNN/i-Vectors and speech separation. In: Proceedings of IEEE International Conference Acoustics Speech Signal Processings, pp. 5415–5419. New Orleans, LA, USA (2017)

  4. Narayanan, A., Wang, D.L.: Investigation of speech separation as a front-end for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 826–835 (2014)

    Article  Google Scholar 

  5. Zhang, X.L., Wang, D.L.: A deep ensemble learning method for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(5), 967–977 (2016)

    Article  MathSciNet  Google Scholar 

  6. Han, K., Wang, Y., Wang, D.L., et al.: Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992 (2015)

    Article  Google Scholar 

  7. Sun, Y., Wang, W., Chambers, J., et al.: Two-stage monaural source separation in reverberant room environments using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 125–139 (2019)

    Article  Google Scholar 

  8. Tu, Y., Du, J., Xu, Y., Dai, L., et al: Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. In: Proc. Int. Symp. Chin. Spoken Lang. Process., Singapore, pp. 250–254. Singapore (2014)

  9. Huang, P.-S., Kim, M., Hasegawa-Johnson, M., et al.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)

    Article  Google Scholar 

  10. Wang, Y., Du, J., Dai, L.R., et al.: A gender mixture detection approach to unsupervised single-channel speech separation based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1535–1546 (2017)

    Article  Google Scholar 

  11. Zhang, X., Wang, D.L.: Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)

    Article  Google Scholar 

  12. Grais, E.M., Roma, G., Simpson, A.J.R., et al.: Two stage single channel audio source separation using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(9), 1773–1783 (2017)

    Article  Google Scholar 

  13. Naithani, G., Nikunen, J., Bramsløw, L., et al: Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications. In: Proc. IEEE Workshop Acoustic Signal Enhancement, pp. 386–390. Tokyo, Japan (2018)

  14. Weninger, F., Eyben, F., Schuller, B.: Single-channel speech separation with memory-enhanced recurrent neural networks. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3709–3713. Florence, Italy (2014)

  15. Joho, M., Lambert, R. H., Mathis, H.: Elementary cost functions for blind separation of non-stationary source signals. In: Acoustics, Speech, & Signal Processing, vol. 5, pp. 2793–2796 Salt Lake City, UT, USA (2001)

  16. Sun, L., Xie, K., Gu, T., et al.: Joint dictionary learning using a new optimization method for single-channel blind source separation. Speech Commun. 106, 85–94 (2019)

    Article  Google Scholar 

  17. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  18. Cooke, M., Barker, J., Cunningham, S., et al.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)

    Article  Google Scholar 

  19. Rix, A., Beerends, J., Hollier, M., et al: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proc. ICASSP, pp. 749–752. Salt Lake City, UT, USA (2001)

  20. Vincent, E., Gribonval, R.: Fevotte C 2006 Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  21. Wang, Y., Narayanan, A., Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61901227, 61671252) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB510049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linhui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhu, G. & Li, P. Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation. SIViP 14, 1387–1395 (2020). https://doi.org/10.1007/s11760-020-01676-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-020-01676-6

Keywords

Navigation