Skip to main content
Log in

Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a new autoencoder network architecture with clustering mechanism for underdetermined blind speech source separation, i.e., the number of mixtures is less than that of sources. The autoencoder network is employed to project the mixtures to embedding space and obtain their embedding vectors. The network model additionally incorporates the clustering mechanism and nearest neighbor clustering algorithm to estimate the clustering centers of the embedding vectors. Then, according to the embedding vectors, the hard and the probability assignment method are proposed to assign the mixtures to their corresponding clusters to recover the sources. The experimental results demonstrate that the proposed method yields better performance compared to the baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Netherlands

    MATH  Google Scholar 

  2. Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81(11):2353–2362

    Article  MATH  Google Scholar 

  3. Chen P, Peng D, Zhen L, Luo Y, Xiang Y (2017) Underdetermined blind separation by combining sparsity and independence of sources. IEEE Access 5:21,731–21,742

    Article  Google Scholar 

  4. Cichocki A, Phan AH, Caiafa C (2008) Flexible hals algorithms for sparse non-negative matrix/tensor factorization. In: 2008 IEEE Workshop on machine learning for signal processing, pp 73–78

  5. Dargan S, Kumar M, Ayyagari MR, Gulshan K (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Archives of Computational Methods in Engineering 27(4):1071–1092

    Article  MathSciNet  Google Scholar 

  6. Elmannai H, Loghmari MA, Naceur MS (2015) Two levels fusion decision for multispectral image pattern recognition. ISPRS Annals of the Photogrammetry Remote Sensing and Spatial Information Sciences II22:69–74

    Article  Google Scholar 

  7. Fan N, Du J, Dai L (2016) A regression approach to binaural speech segregation via deep neural network. In: 2016 10Th international symposium on chinese spoken language processing (ISCSLP), pp 1–5

  8. Gavrilescu M (2014) Improved automatic speech recognition system using sparse decomposition by basis pursuit with deep rectifier neural networks and compressed sensing recomposition of speech signals. In: 2014 10Th international conference on communications (COMM), pp 1–6

  9. Guo Y, Naik GR, Nguyen H (2013) Single channel blind source separation based local mean decomposition for biomedical applications. In: 2013 35Th annual international conference of the IEEE engineering in medicine and biology society, pp 6812–6815

  10. Han C, Luo Y, Mesgarani N (2019) Online deep attractor network for real-time single-channel speech separation. In: ICASSP 2019 - 2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 361–365

  11. Hershey JR, Chen Z, Le Roux J, Watanabe S (2016) Deep clustering: Discriminative embeddings for segmentation and separation. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 31–35

  12. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97

    Article  Google Scholar 

  13. Jiang Y, Wang D, Liu R, Feng Z (2014) Binaural classification for reverberant speech segregation using deep neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 22(12):2112–2121

    Article  Google Scholar 

  14. Jun H, Chen Y, Zhang Q, Sun G, Hu Q (2018) Blind source separation method for bearing vibration signals. IEEE Access 6:658–664

    Article  Google Scholar 

  15. Keriven N, Deleforge A, Liutkus A (2018) Blind source separation using mixtures of alpha-stable distributions. In: 2018 IEEE International conference on acoustics, speech and signal processing, pp 771–775

  16. Kolbæk M, Yu D, Tan Z, Jensen J (2017) Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 25 (10):1901–1913

    Article  Google Scholar 

  17. Li H, Zhang X (2012) Blind separation of noisy mixed speech based on independent component analysis and neural network. In: 2012 International conference on computing, measurement, control and sensor network, pp 105–108

  18. Li H, Zhang X (2012) Blind separation of noisy mixed speech based on independent component analysis and neural network. In: 2012 International conference on computing, measurement, control and sensor network, pp 105–108

  19. Ma L, Wang C, Baihua X (2012) Sparse representation based on matrix rank minimization and k-means clustering for recognition. In: The 2012 international joint conference on neural networks (IJCNN), pp 1–8

  20. Makino S, Sawada H, Lee TW (2007) Blind speech separation. Springer, Netherlands

    Book  Google Scholar 

  21. Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio Speech, and Language Processing 21(10):2140–2151

    Article  Google Scholar 

  22. Pandey A, Wang D (2019) A new framework for cnn-based speech enhancement in the time domain. IEEE/ACM Transactions on Audio Speech and Language Processing 27(7):1179–1188

    Article  Google Scholar 

  23. Saab R, Yilmaz O, McKeown MJ, Abugharbieh R (2007) Underdetermined anechoic blind source separation via q-basis-pursuit with q ≪ 1. IEEE Transactions on Signal Processing 55(8):4004–4017

    Article  MathSciNet  MATH  Google Scholar 

  24. Salaün Y, Vincent E, Bertin N, Souviraà-Labastie N, Jaureguiberry X, Tran DT, Bimbot F (2014) The flexible audio source separation toolbox version 2.0. In: IEEE International conference on acoustics, speech and signal processing

  25. Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science 2(6):420

    Article  Google Scholar 

  26. Tao X, Wenwu W (2009) A compressed sensing approach for underdetermined blind audio source separation with sparse representation. In: 2009 IEEE/SP 15Th workshop on statistical signal processing, pp 493–496

  27. Vassil P, Guoguo C, Daniel P, Sanjeev K (2015) Librispeech: an asr corpus based on public domain audio books. http://www.openslr.org

  28. Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Transactions on Audio Speech and Language Processing 14(4):1462–1469

    Article  Google Scholar 

  29. Wang Z, Le Roux J, Hershey JR (2018) Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 1–5

  30. Wang S, Naithani G, Virtanen T (2019) Low-latency deep clustering for speech separation. In: ICASSP 2019 - 2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 76–80

  31. Williamson DS, Wang D (2017) Time-frequency masking in the complex domain for speech dereverberation and denoising. IEEE/ACM Transactions on Audio Speech, and Language Processing 25(7):1492–1501

    Article  Google Scholar 

  32. Yang Z, Xiang Y, Xie K, Lai Y (2017) Adaptive method for nonsmooth nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 28(4):948–960

    Article  Google Scholar 

  33. Ye Z, Kang C, Kangrui W, Tenglong Y, Nanrun Z (2014) Audio-visual underdetermined blind source separation algorithm based on gaussian potential function. China Commun 11(6):71–80

    Article  Google Scholar 

  34. Yu D, Kolbæk M, Tan Z, Jensen J (2017) Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 241–245

  35. Zermini A, Liu Q, Xu Y, Plumbley MD, Betts D, Wang W (2017) Binaural and log-power spectra features with deep neural networks for speech-noise separation. In: 2017 IEEE 19Th international workshop on multimedia signal processing (MMSP), pp 1–6

  36. Zhang Y, Cao K, Wu K, Yu T (2012) Using gaussian potential function for underdetermined blind sources separation based on duet. In: Artificial intelligence and computational intelligenc, pp 75–81

  37. Zhang S, Xie W, Zhu H, Zhao H (2017) Combined eigenvector analysis and independent component analysis for multi-component periodic interferences suppression in prcpm-pd detection system. IEEE Access 5:12,552–12,562

    Article  Google Scholar 

  38. Zhen L, Peng D, Zhang H, Sang Y, Zhang L (2020) Underdetermined mixing matrix estimation by exploiting sparsity of sources. Measurement, 152

  39. Zibulevsky M, Pearlmutter BA (2014) Blind source separation by sparse decomposition in a signal dictionary. Neural Comput 13(4):863–882

    Article  MATH  Google Scholar 

  40. Zuyuan Y, Guoxu Z, Shengli X, Shuxue D, Jun-Mei Y, Jun Z (2011) Blind spectral unmixing based on sparse nonnegative matrix factorization. IEEE Trans Image Process 20(4):1112–1125

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61866024).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (Grant No. 61866024).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, M., Zhang, Y. Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm. Multimed Tools Appl 82, 1171–1183 (2023). https://doi.org/10.1007/s11042-022-13009-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13009-5

Keywords

Navigation