Skip to main content
Log in

Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The main focus of this paper is the separation of underdetermined convolutive blind speech in a multi-speaker environment. We present a method based on mask prediction in the time-frequency domain. Firstly, depending on the sparsity of signals in the time-frequency (TF) domain, we extimate speakers’ masks by clustering the relative absolute and Hermitian angle features extracted from the frequency components of the mixtures. Speech separation algorithms that are based on the sparsity and disjoint orthogonality of the speech signals in the time-frequency domain are not efficient when more than one source is active. Hence, in this paper, the cluster centers are estimated mostly based on the TF units that probably have only one active source. The correlations between the estimated masks, belonging to adjacent frequency bins, are leveraged to solve the permutation problem. To increase the accuracy, we have zeroed the value of masks at the TF unit without any active source. Moreover, in clustering, we employ a weighting function to consider the parts of masks that probably contains just one active source. Finally, in order to decrease the musical noise of the separated signals and improve their quality, sparse filters in the time-domain are utilized to re-estimate the separated signals. Performance of the proposed method is evaluated by a number of simulated and real speech signals. The simulated experiments were performed using a public dataset and Roomsim simulator. Compared the proposed method with some conventional algorithms, we observed that our separation method is more accurate than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5.
Fig. 6
Fig. 7.
Fig. 8
Fig. 9.

Similar content being viewed by others

References

  1. Aissa-El-Bey A, Abed-Meraim K, Grenier Y (2007) Blind separation of underdetermined convolutive mixtures using their time–frequency representation. IEEE Trans Audio Speech Lang Process 15:1540–1550

  2. Araki S, Makino S, Blin A, Mukai R, Sawada H (2004) Underdetermined blind separation for speech in real environments with sparseness and ICA. IEEE ICASSP 3(iii):881–884

    Google Scholar 

  3. Araki S, Sawada H, Mukai R, Makino S (2007) Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process 87:1833–1847

  4. Araki S, Nesta F, Vincent E, Koldovsky Z, Nolte GA, Ziehe A, Benichoux A (2012) The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio source separation. 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), p 414–422

  5. Asaei A, Bourlard H, Taghizadeh MJ, Cevher V (2017) Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis. Speech Commun 76:201–217

  6. Bofill P, Zibulevsky M (2000) Blind separation of more sources than mixtures using sparsity of their short-time fourier transform. Proc ICA 2000:87–92

  7. Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81:2353–2362

    Article  Google Scholar 

  8. Bouafif M, Lachiri Z (2016) Undetermined blind source separation technique based on speech features extraction. Int J Speech Technol 19:697–706

    Article  Google Scholar 

  9. Jeon KM, Kim HK (2020) Sparsity-based phase spectrum compensation for single-channel speech source separation. Digital Signal Process 97

  10. Kitamura D, Ono N, Sawada H, Kameoka H, Saruwatari H (2016) Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 24:1626–1641

  11. Luo Y, Mesgarani N (2019) Conv-Tasnet: surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Trans Audio, Speech, Language Process 27(8):1256–1266

    Article  Google Scholar 

  12. Nesta F, Omologo M (2012) Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation. International Conference on Latent Variable Analysis and Signal Separation:222–230

  13. O’Grady PD, Pearlmutter BA, Rickard ST (2005) Survey of sparse and non-sparse methods in source Separation. Int J Imaging Syst Technol 15:18–33

  14. Ochs P, Chen Y, Brox T, Poc T (2014) iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J Imaging Sci 7(2):1388–1419

    Article  MathSciNet  Google Scholar 

  15. Peng B, Liu W, Mandic DP (2012) Reducing permutation error in subband-based convolutive blind separation. IET Signal Process 6(1):34–44

  16. Prasanna Kumar MK, Kumaraswamy R (2015) Detection and separation of the sources from underdetermined instantaneous mixtures without estimating the inverse matrix. International Conference on Communications and Signal Processing (ICCSP), Melmaruvathur, p 0095–0099. https://doi.org/10.1109/ICCSP.2015.7322636

    Book  Google Scholar 

  17. Reju V, Koh GSN, Soon IY (2010) Underdetermined convolutive blind source separation via time–frequency masking. IEEE Trans Audio Speech Lang Process 18:101–116

  18. Sawada H, Araki S, Mukai R, Makino S (2006) Blind extraction of dominant target sources using ICA and time–frequency masking. IEEE Trans Audio Speech Lang Process 14(6):2165–2173. https://doi.org/10.1109/TASL.2006.872599

    Article  Google Scholar 

  19. Scharnhorst K (2001) Angles in complex vector spaces. Acta Appl Math 69:95–103. https://doi.org/10.1023/A:1012692601098

    Article  MathSciNet  MATH  Google Scholar 

  20. Tu Y, Du J, Lee C (2020) 2D-to-2D mask estimation for speech enhancement based on fully convolutional neural network. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona p 6664–6668

  21. Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14:1462–1469

  22. Wang D, Chen J (2017) Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans Audio Speech Language Process 26(10):1702–1726. https://doi.org/10.1109/TASLP.2018.2842159

    Article  Google Scholar 

  23. Wenye M, Meng Y, Jack X, Stanley O (2010) Reducing musical noise in blind source separation by time-domain sparse filters and split Bregman method. Interspeech:402–405

  24. Yatabe K, Kitamura D (2018) Determined blind source separation via proximal splitting algorithm. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, p 776–780. https://doi.org/10.1109/ICASSP.2018.8462338

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Setayeshi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zohrevandi, M., Setayeshi, S., Rabiee, A. et al. Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals. Multimed Tools Appl 80, 12601–12618 (2021). https://doi.org/10.1007/s11042-020-10398-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10398-3

Keywords

Navigation