Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation

Sun, Linhui; Zhu, Ge; Li, Pingan

doi:10.1007/s11760-020-01676-6

Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation

Original Paper
Published: 12 April 2020

Volume 14, pages 1387–1395, (2020)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

254 Accesses
4 Citations
Explore all metrics

Abstract

Single-channel speech separation (SCSS) plays an important role in speech processing. It is an underdetermined problem since several signals need to be recovered from one channel, which is more difficult to solve. To achieve SCSS more effectively, we propose a new cost function. What’s more, a joint constraint algorithm based on this function is used to separate mixed speech signals, which aims to separate two sources at the same time accurately. The joint constraint algorithm not only penalizes residual sum of square, but also exploits the joint relationship between the outputs to train the dual output DNN. In these joint constraints, the training accuracy of the separation model can be further increased. We evaluate the proposed algorithm performance on the GRID corpus. The experimental results show that the new algorithm can obtain better speech intelligibility compared to the basic cost function. In the aspects of source-to-distortion ratio , signal-to-interference ratio, source-to-artifact ratio and perceptual evaluation of speech quality, the novel approach can obtain better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

Fig. 6

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

Article Open access 12 October 2023

Single Channel Speech Separation Using Deep Neural Network

Dual transform based joint learning single channel speech separation using generative joint dictionary learning

Article 02 April 2022

References

Du, J., Tu, Y., Dai, L., Lee, C.: A regression approach to single-channel speech separation via high-resolution deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1424–1437 (2017)
Article Google Scholar
Corey, R. M., Singer, A. C.: Dynamic range compression for noisy mixtures using source separation and beamforming. In: Proc. IEEE Workshop Appl. Signal Processing Audio and Acoustics, pp. 289–293. New Paltz, NY, USA (2017)
Chang, J., Wang, D.: Robust speaker recognition based on DNN/i-Vectors and speech separation. In: Proceedings of IEEE International Conference Acoustics Speech Signal Processings, pp. 5415–5419. New Orleans, LA, USA (2017)
Narayanan, A., Wang, D.L.: Investigation of speech separation as a front-end for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 826–835 (2014)
Article Google Scholar
Zhang, X.L., Wang, D.L.: A deep ensemble learning method for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(5), 967–977 (2016)
Article MathSciNet Google Scholar
Han, K., Wang, Y., Wang, D.L., et al.: Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992 (2015)
Article Google Scholar
Sun, Y., Wang, W., Chambers, J., et al.: Two-stage monaural source separation in reverberant room environments using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 125–139 (2019)
Article Google Scholar
Tu, Y., Du, J., Xu, Y., Dai, L., et al: Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. In: Proc. Int. Symp. Chin. Spoken Lang. Process., Singapore, pp. 250–254. Singapore (2014)
Huang, P.-S., Kim, M., Hasegawa-Johnson, M., et al.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)
Article Google Scholar
Wang, Y., Du, J., Dai, L.R., et al.: A gender mixture detection approach to unsupervised single-channel speech separation based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1535–1546 (2017)
Article Google Scholar
Zhang, X., Wang, D.L.: Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)
Article Google Scholar
Grais, E.M., Roma, G., Simpson, A.J.R., et al.: Two stage single channel audio source separation using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(9), 1773–1783 (2017)
Article Google Scholar
Naithani, G., Nikunen, J., Bramsløw, L., et al: Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications. In: Proc. IEEE Workshop Acoustic Signal Enhancement, pp. 386–390. Tokyo, Japan (2018)
Weninger, F., Eyben, F., Schuller, B.: Single-channel speech separation with memory-enhanced recurrent neural networks. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3709–3713. Florence, Italy (2014)
Joho, M., Lambert, R. H., Mathis, H.: Elementary cost functions for blind separation of non-stationary source signals. In: Acoustics, Speech, & Signal Processing, vol. 5, pp. 2793–2796 Salt Lake City, UT, USA (2001)
Sun, L., Xie, K., Gu, T., et al.: Joint dictionary learning using a new optimization method for single-channel blind source separation. Speech Commun. 106, 85–94 (2019)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Cooke, M., Barker, J., Cunningham, S., et al.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)
Article Google Scholar
Rix, A., Beerends, J., Hollier, M., et al: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proc. ICASSP, pp. 749–752. Salt Lake City, UT, USA (2001)
Vincent, E., Gribonval, R.: Fevotte C 2006 Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar
Wang, Y., Narayanan, A., Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61901227, 61671252) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB510049).

Author information

Authors and Affiliations

College of Telecommunications & Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210003, Jiangsu, China
Linhui Sun, Ge Zhu & Pingan Li

Authors

Linhui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ge Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Pingan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linhui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, L., Zhu, G. & Li, P. Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation. SIViP 14, 1387–1395 (2020). https://doi.org/10.1007/s11760-020-01676-6

Download citation

Received: 09 October 2019
Revised: 05 February 2020
Accepted: 18 March 2020
Published: 12 April 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11760-020-01676-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation

Abstract

Access this article

Similar content being viewed by others

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

Single Channel Speech Separation Using Deep Neural Network

Dual transform based joint learning single channel speech separation using generative joint dictionary learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation

Abstract

Access this article

Similar content being viewed by others

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

Single Channel Speech Separation Using Deep Neural Network

Dual transform based joint learning single channel speech separation using generative joint dictionary learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation