Abstract
The main focus of this paper is the separation of underdetermined convolutive blind speech in a multi-speaker environment. We present a method based on mask prediction in the time-frequency domain. Firstly, depending on the sparsity of signals in the time-frequency (TF) domain, we extimate speakers’ masks by clustering the relative absolute and Hermitian angle features extracted from the frequency components of the mixtures. Speech separation algorithms that are based on the sparsity and disjoint orthogonality of the speech signals in the time-frequency domain are not efficient when more than one source is active. Hence, in this paper, the cluster centers are estimated mostly based on the TF units that probably have only one active source. The correlations between the estimated masks, belonging to adjacent frequency bins, are leveraged to solve the permutation problem. To increase the accuracy, we have zeroed the value of masks at the TF unit without any active source. Moreover, in clustering, we employ a weighting function to consider the parts of masks that probably contains just one active source. Finally, in order to decrease the musical noise of the separated signals and improve their quality, sparse filters in the time-domain are utilized to re-estimate the separated signals. Performance of the proposed method is evaluated by a number of simulated and real speech signals. The simulated experiments were performed using a public dataset and Roomsim simulator. Compared the proposed method with some conventional algorithms, we observed that our separation method is more accurate than other approaches.
Similar content being viewed by others
References
Aissa-El-Bey A, Abed-Meraim K, Grenier Y (2007) Blind separation of underdetermined convolutive mixtures using their time–frequency representation. IEEE Trans Audio Speech Lang Process 15:1540–1550
Araki S, Makino S, Blin A, Mukai R, Sawada H (2004) Underdetermined blind separation for speech in real environments with sparseness and ICA. IEEE ICASSP 3(iii):881–884
Araki S, Sawada H, Mukai R, Makino S (2007) Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process 87:1833–1847
Araki S, Nesta F, Vincent E, Koldovsky Z, Nolte GA, Ziehe A, Benichoux A (2012) The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio source separation. 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), p 414–422
Asaei A, Bourlard H, Taghizadeh MJ, Cevher V (2017) Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis. Speech Commun 76:201–217
Bofill P, Zibulevsky M (2000) Blind separation of more sources than mixtures using sparsity of their short-time fourier transform. Proc ICA 2000:87–92
Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81:2353–2362
Bouafif M, Lachiri Z (2016) Undetermined blind source separation technique based on speech features extraction. Int J Speech Technol 19:697–706
Jeon KM, Kim HK (2020) Sparsity-based phase spectrum compensation for single-channel speech source separation. Digital Signal Process 97
Kitamura D, Ono N, Sawada H, Kameoka H, Saruwatari H (2016) Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 24:1626–1641
Luo Y, Mesgarani N (2019) Conv-Tasnet: surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Trans Audio, Speech, Language Process 27(8):1256–1266
Nesta F, Omologo M (2012) Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation. International Conference on Latent Variable Analysis and Signal Separation:222–230
O’Grady PD, Pearlmutter BA, Rickard ST (2005) Survey of sparse and non-sparse methods in source Separation. Int J Imaging Syst Technol 15:18–33
Ochs P, Chen Y, Brox T, Poc T (2014) iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J Imaging Sci 7(2):1388–1419
Peng B, Liu W, Mandic DP (2012) Reducing permutation error in subband-based convolutive blind separation. IET Signal Process 6(1):34–44
Prasanna Kumar MK, Kumaraswamy R (2015) Detection and separation of the sources from underdetermined instantaneous mixtures without estimating the inverse matrix. International Conference on Communications and Signal Processing (ICCSP), Melmaruvathur, p 0095–0099. https://doi.org/10.1109/ICCSP.2015.7322636
Reju V, Koh GSN, Soon IY (2010) Underdetermined convolutive blind source separation via time–frequency masking. IEEE Trans Audio Speech Lang Process 18:101–116
Sawada H, Araki S, Mukai R, Makino S (2006) Blind extraction of dominant target sources using ICA and time–frequency masking. IEEE Trans Audio Speech Lang Process 14(6):2165–2173. https://doi.org/10.1109/TASL.2006.872599
Scharnhorst K (2001) Angles in complex vector spaces. Acta Appl Math 69:95–103. https://doi.org/10.1023/A:1012692601098
Tu Y, Du J, Lee C (2020) 2D-to-2D mask estimation for speech enhancement based on fully convolutional neural network. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona p 6664–6668
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14:1462–1469
Wang D, Chen J (2017) Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans Audio Speech Language Process 26(10):1702–1726. https://doi.org/10.1109/TASLP.2018.2842159
Wenye M, Meng Y, Jack X, Stanley O (2010) Reducing musical noise in blind source separation by time-domain sparse filters and split Bregman method. Interspeech:402–405
Yatabe K, Kitamura D (2018) Determined blind source separation via proximal splitting algorithm. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, p 776–780. https://doi.org/10.1109/ICASSP.2018.8462338
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zohrevandi, M., Setayeshi, S., Rabiee, A. et al. Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals. Multimed Tools Appl 80, 12601–12618 (2021). https://doi.org/10.1007/s11042-020-10398-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10398-3