Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals

Zohrevandi, Mahbanou; Setayeshi, Saeed; Rabiee, Azam; Reshadi, Midia

doi:10.1007/s11042-020-10398-3

Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals

Published: 12 January 2021

Volume 80, pages 12601–12618, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mahbanou Zohrevandi¹,
Saeed Setayeshi ORCID: orcid.org/0000-0002-1415-222X²,
Azam Rabiee³ &
…
Midia Reshadi¹

267 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

The main focus of this paper is the separation of underdetermined convolutive blind speech in a multi-speaker environment. We present a method based on mask prediction in the time-frequency domain. Firstly, depending on the sparsity of signals in the time-frequency (TF) domain, we extimate speakers’ masks by clustering the relative absolute and Hermitian angle features extracted from the frequency components of the mixtures. Speech separation algorithms that are based on the sparsity and disjoint orthogonality of the speech signals in the time-frequency domain are not efficient when more than one source is active. Hence, in this paper, the cluster centers are estimated mostly based on the TF units that probably have only one active source. The correlations between the estimated masks, belonging to adjacent frequency bins, are leveraged to solve the permutation problem. To increase the accuracy, we have zeroed the value of masks at the TF unit without any active source. Moreover, in clustering, we employ a weighting function to consider the parts of masks that probably contains just one active source. Finally, in order to decrease the musical noise of the separated signals and improve their quality, sparse filters in the time-domain are utilized to re-estimate the separated signals. Performance of the proposed method is evaluated by a number of simulated and real speech signals. The simulated experiments were performed using a public dataset and Roomsim simulator. Compared the proposed method with some conventional algorithms, we observed that our separation method is more accurate than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid algorithm for blind source separation of a convolutive mixture of three speech sources

Article Open access 17 June 2014

Effective pattern recognition and find-density-peaks clustering based blind identification for underdetermined speech mixing systems

Article 20 January 2018

Underdetermined blind source separation technique based on speech features extraction

Article 25 August 2016

References

Aissa-El-Bey A, Abed-Meraim K, Grenier Y (2007) Blind separation of underdetermined convolutive mixtures using their time–frequency representation. IEEE Trans Audio Speech Lang Process 15:1540–1550
Araki S, Makino S, Blin A, Mukai R, Sawada H (2004) Underdetermined blind separation for speech in real environments with sparseness and ICA. IEEE ICASSP 3(iii):881–884
Google Scholar
Araki S, Sawada H, Mukai R, Makino S (2007) Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process 87:1833–1847
Araki S, Nesta F, Vincent E, Koldovsky Z, Nolte GA, Ziehe A, Benichoux A (2012) The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio source separation. 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), p 414–422
Asaei A, Bourlard H, Taghizadeh MJ, Cevher V (2017) Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis. Speech Commun 76:201–217
Bofill P, Zibulevsky M (2000) Blind separation of more sources than mixtures using sparsity of their short-time fourier transform. Proc ICA 2000:87–92
Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81:2353–2362
Article Google Scholar
Bouafif M, Lachiri Z (2016) Undetermined blind source separation technique based on speech features extraction. Int J Speech Technol 19:697–706
Article Google Scholar
Jeon KM, Kim HK (2020) Sparsity-based phase spectrum compensation for single-channel speech source separation. Digital Signal Process 97
Kitamura D, Ono N, Sawada H, Kameoka H, Saruwatari H (2016) Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 24:1626–1641
Luo Y, Mesgarani N (2019) Conv-Tasnet: surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Trans Audio, Speech, Language Process 27(8):1256–1266
Article Google Scholar
Nesta F, Omologo M (2012) Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation. International Conference on Latent Variable Analysis and Signal Separation:222–230
O’Grady PD, Pearlmutter BA, Rickard ST (2005) Survey of sparse and non-sparse methods in source Separation. Int J Imaging Syst Technol 15:18–33
Ochs P, Chen Y, Brox T, Poc T (2014) iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J Imaging Sci 7(2):1388–1419
Article MathSciNet Google Scholar
Peng B, Liu W, Mandic DP (2012) Reducing permutation error in subband-based convolutive blind separation. IET Signal Process 6(1):34–44
Prasanna Kumar MK, Kumaraswamy R (2015) Detection and separation of the sources from underdetermined instantaneous mixtures without estimating the inverse matrix. International Conference on Communications and Signal Processing (ICCSP), Melmaruvathur, p 0095–0099. https://doi.org/10.1109/ICCSP.2015.7322636
Book Google Scholar
Reju V, Koh GSN, Soon IY (2010) Underdetermined convolutive blind source separation via time–frequency masking. IEEE Trans Audio Speech Lang Process 18:101–116
Sawada H, Araki S, Mukai R, Makino S (2006) Blind extraction of dominant target sources using ICA and time–frequency masking. IEEE Trans Audio Speech Lang Process 14(6):2165–2173. https://doi.org/10.1109/TASL.2006.872599
Article Google Scholar
Scharnhorst K (2001) Angles in complex vector spaces. Acta Appl Math 69:95–103. https://doi.org/10.1023/A:1012692601098
Article MathSciNet MATH Google Scholar
Tu Y, Du J, Lee C (2020) 2D-to-2D mask estimation for speech enhancement based on fully convolutional neural network. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona p 6664–6668
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14:1462–1469
Wang D, Chen J (2017) Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans Audio Speech Language Process 26(10):1702–1726. https://doi.org/10.1109/TASLP.2018.2842159
Article Google Scholar
Wenye M, Meng Y, Jack X, Stanley O (2010) Reducing musical noise in blind source separation by time-domain sparse filters and split Bregman method. Interspeech:402–405
Yatabe K, Kitamura D (2018) Determined blind source separation via proximal splitting algorithm. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, p 776–780. https://doi.org/10.1109/ICASSP.2018.8462338

Download references

Author information

Authors and Affiliations

Department of Computer engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Mahbanou Zohrevandi & Midia Reshadi
Department of Medical Radiation, Amirkabir University of Technology, Tehran, Iran
Saeed Setayeshi
Department of Computer Science, Dolatabad Branch, Islamic Azad University, Isfahan, Iran
Azam Rabiee

Authors

Mahbanou Zohrevandi
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Setayeshi
View author publications
You can also search for this author in PubMed Google Scholar
Azam Rabiee
View author publications
You can also search for this author in PubMed Google Scholar
Midia Reshadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Setayeshi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zohrevandi, M., Setayeshi, S., Rabiee, A. et al. Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals. Multimed Tools Appl 80, 12601–12618 (2021). https://doi.org/10.1007/s11042-020-10398-3

Download citation

Received: 12 November 2019
Revised: 27 August 2020
Accepted: 22 December 2020
Published: 12 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10398-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals

Abstract

Access this article

Similar content being viewed by others

A hybrid algorithm for blind source separation of a convolutive mixture of three speech sources

Effective pattern recognition and find-density-peaks clustering based blind identification for underdetermined speech mixing systems

Underdetermined blind source separation technique based on speech features extraction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals

Abstract

Access this article

Similar content being viewed by others

A hybrid algorithm for blind source separation of a convolutive mixture of three speech sources

Effective pattern recognition and find-density-peaks clustering based blind identification for underdetermined speech mixing systems

Underdetermined blind source separation technique based on speech features extraction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation