Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm

Niu, Mengdie; Zhang, Ye

doi:10.1007/s11042-022-13009-5

Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm

Published: 14 June 2022

Volume 82, pages 1171–1183, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

241 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose a new autoencoder network architecture with clustering mechanism for underdetermined blind speech source separation, i.e., the number of mixtures is less than that of sources. The autoencoder network is employed to project the mixtures to embedding space and obtain their embedding vectors. The network model additionally incorporates the clustering mechanism and nearest neighbor clustering algorithm to estimate the clustering centers of the embedding vectors. Then, according to the embedding vectors, the hard and the probability assignment method are proposed to assign the mixtures to their corresponding clusters to recover the sources. The experimental results demonstrate that the proposed method yields better performance compared to the baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autoencoders and their applications in machine learning: a survey

Article Open access 03 February 2024

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Article 22 April 2024

References

Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Netherlands
MATH Google Scholar
Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81(11):2353–2362
Article MATH Google Scholar
Chen P, Peng D, Zhen L, Luo Y, Xiang Y (2017) Underdetermined blind separation by combining sparsity and independence of sources. IEEE Access 5:21,731–21,742
Article Google Scholar
Cichocki A, Phan AH, Caiafa C (2008) Flexible hals algorithms for sparse non-negative matrix/tensor factorization. In: 2008 IEEE Workshop on machine learning for signal processing, pp 73–78
Dargan S, Kumar M, Ayyagari MR, Gulshan K (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Archives of Computational Methods in Engineering 27(4):1071–1092
Article MathSciNet Google Scholar
Elmannai H, Loghmari MA, Naceur MS (2015) Two levels fusion decision for multispectral image pattern recognition. ISPRS Annals of the Photogrammetry Remote Sensing and Spatial Information Sciences II22:69–74
Article Google Scholar
Fan N, Du J, Dai L (2016) A regression approach to binaural speech segregation via deep neural network. In: 2016 10Th international symposium on chinese spoken language processing (ISCSLP), pp 1–5
Gavrilescu M (2014) Improved automatic speech recognition system using sparse decomposition by basis pursuit with deep rectifier neural networks and compressed sensing recomposition of speech signals. In: 2014 10Th international conference on communications (COMM), pp 1–6
Guo Y, Naik GR, Nguyen H (2013) Single channel blind source separation based local mean decomposition for biomedical applications. In: 2013 35Th annual international conference of the IEEE engineering in medicine and biology society, pp 6812–6815
Han C, Luo Y, Mesgarani N (2019) Online deep attractor network for real-time single-channel speech separation. In: ICASSP 2019 - 2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 361–365
Hershey JR, Chen Z, Le Roux J, Watanabe S (2016) Deep clustering: Discriminative embeddings for segmentation and separation. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 31–35
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Article Google Scholar
Jiang Y, Wang D, Liu R, Feng Z (2014) Binaural classification for reverberant speech segregation using deep neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 22(12):2112–2121
Article Google Scholar
Jun H, Chen Y, Zhang Q, Sun G, Hu Q (2018) Blind source separation method for bearing vibration signals. IEEE Access 6:658–664
Article Google Scholar
Keriven N, Deleforge A, Liutkus A (2018) Blind source separation using mixtures of alpha-stable distributions. In: 2018 IEEE International conference on acoustics, speech and signal processing, pp 771–775
Kolbæk M, Yu D, Tan Z, Jensen J (2017) Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 25 (10):1901–1913
Article Google Scholar
Li H, Zhang X (2012) Blind separation of noisy mixed speech based on independent component analysis and neural network. In: 2012 International conference on computing, measurement, control and sensor network, pp 105–108
Li H, Zhang X (2012) Blind separation of noisy mixed speech based on independent component analysis and neural network. In: 2012 International conference on computing, measurement, control and sensor network, pp 105–108
Ma L, Wang C, Baihua X (2012) Sparse representation based on matrix rank minimization and k-means clustering for recognition. In: The 2012 international joint conference on neural networks (IJCNN), pp 1–8
Makino S, Sawada H, Lee TW (2007) Blind speech separation. Springer, Netherlands
Book Google Scholar
Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio Speech, and Language Processing 21(10):2140–2151
Article Google Scholar
Pandey A, Wang D (2019) A new framework for cnn-based speech enhancement in the time domain. IEEE/ACM Transactions on Audio Speech and Language Processing 27(7):1179–1188
Article Google Scholar
Saab R, Yilmaz O, McKeown MJ, Abugharbieh R (2007) Underdetermined anechoic blind source separation via ℓ^q-basis-pursuit with q ≪ 1. IEEE Transactions on Signal Processing 55(8):4004–4017
Article MathSciNet MATH Google Scholar
Salaün Y, Vincent E, Bertin N, Souviraà-Labastie N, Jaureguiberry X, Tran DT, Bimbot F (2014) The flexible audio source separation toolbox version 2.0. In: IEEE International conference on acoustics, speech and signal processing
Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science 2(6):420
Article Google Scholar
Tao X, Wenwu W (2009) A compressed sensing approach for underdetermined blind audio source separation with sparse representation. In: 2009 IEEE/SP 15Th workshop on statistical signal processing, pp 493–496
Vassil P, Guoguo C, Daniel P, Sanjeev K (2015) Librispeech: an asr corpus based on public domain audio books. http://www.openslr.org
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Transactions on Audio Speech and Language Processing 14(4):1462–1469
Article Google Scholar
Wang Z, Le Roux J, Hershey JR (2018) Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 1–5
Wang S, Naithani G, Virtanen T (2019) Low-latency deep clustering for speech separation. In: ICASSP 2019 - 2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 76–80
Williamson DS, Wang D (2017) Time-frequency masking in the complex domain for speech dereverberation and denoising. IEEE/ACM Transactions on Audio Speech, and Language Processing 25(7):1492–1501
Article Google Scholar
Yang Z, Xiang Y, Xie K, Lai Y (2017) Adaptive method for nonsmooth nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 28(4):948–960
Article Google Scholar
Ye Z, Kang C, Kangrui W, Tenglong Y, Nanrun Z (2014) Audio-visual underdetermined blind source separation algorithm based on gaussian potential function. China Commun 11(6):71–80
Article Google Scholar
Yu D, Kolbæk M, Tan Z, Jensen J (2017) Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 241–245
Zermini A, Liu Q, Xu Y, Plumbley MD, Betts D, Wang W (2017) Binaural and log-power spectra features with deep neural networks for speech-noise separation. In: 2017 IEEE 19Th international workshop on multimedia signal processing (MMSP), pp 1–6
Zhang Y, Cao K, Wu K, Yu T (2012) Using gaussian potential function for underdetermined blind sources separation based on duet. In: Artificial intelligence and computational intelligenc, pp 75–81
Zhang S, Xie W, Zhu H, Zhao H (2017) Combined eigenvector analysis and independent component analysis for multi-component periodic interferences suppression in prcpm-pd detection system. IEEE Access 5:12,552–12,562
Article Google Scholar
Zhen L, Peng D, Zhang H, Sang Y, Zhang L (2020) Underdetermined mixing matrix estimation by exploiting sparsity of sources. Measurement, 152
Zibulevsky M, Pearlmutter BA (2014) Blind source separation by sparse decomposition in a signal dictionary. Neural Comput 13(4):863–882
Article MATH Google Scholar
Zuyuan Y, Guoxu Z, Shengli X, Shuxue D, Jun-Mei Y, Jun Z (2011) Blind spectral unmixing based on sparse nonnegative matrix factorization. IEEE Trans Image Process 20(4):1112–1125
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61866024).

Author information

Authors and Affiliations

Department of Electronic Information Engineering, Nanchang University, Nanchang, China
Mengdie Niu & Ye Zhang

Authors

Mengdie Niu
View author publications
You can also search for this author in PubMed Google Scholar
Ye Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ye Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (Grant No. 61866024).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niu, M., Zhang, Y. Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm. Multimed Tools Appl 82, 1171–1183 (2023). https://doi.org/10.1007/s11042-022-13009-5

Download citation

Received: 26 July 2020
Revised: 19 March 2022
Accepted: 28 March 2022
Published: 14 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13009-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Autoencoders and their applications in machine learning: a survey

Chinese dialect speech recognition: a comprehensive survey

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Underdetermined blind speech source separation based on deep nearest neighbor clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Autoencoders and their applications in machine learning: a survey

Chinese dialect speech recognition: a comprehensive survey

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation