Abstract
In this paper, we consider the problem of separating the speech source signal from the underdetermined convolutive mixture signals using capsule network (CapsNet). The objective of this paper is twofold. They are (1) to improve the underdetermined convolutive blind source separation algorithm in terms of signal-to-distortion ratio, signal-to-interference ratio and signal-to-artifact ratio; (2) to minimize the computational burden of the algorithm so that it is useful for applications like speech recognition system. The time–frequency points of the observed mixture signals are input to the first layer of CapsNet. In the first layer, single-source active point (SSP) is calculated using the ratio of mixtures. These SSPs are lower-level capsules in our system. In the second layer, we find a cluster center using a dynamic routing algorithm and these clusters are used to construct a binary mask. Finally, the algorithm solves the permutation problem by determining the correlation between the amplitudes of adjacent frequency bins. We test our algorithm on the live recording mixture signals obtained in the real environment and synthetically convoluted mixture signals. The test result shows the effectiveness of the proposed method when compared with the existing algorithms in terms of computational load, signal-to-distortion ratio and signal-to-interference ratio.
Similar content being viewed by others
References
Abrard F, Deville Y (2003) Blind separation of dependent sources using the “time–frequency ratio of mixtures” approach. In: Seventh international symposium on signal processing and its applications proceedings. https://doi.org/10.1109/isspa.2003.1224820
Aissa-El-Bey A, Abed-Meraim K, Grenier Y (2007a) Blind separation of underdetermined convolutive mixtures using their time–frequency representation. IEEE Trans Audio Speech Lang Process 15(5):1540–1550. https://doi.org/10.1109/tasl.2007.898455
Aissa-El-Bey A, Linh-Trung N, Abed-Meraim K, Belouchrani A, Grenier Y (2007b) Underdetermined blind separation of nondisjoint sources in the time–frequency domain. IEEE Trans Signal Process 55(3):897–907. https://doi.org/10.1109/tsp.2006.888877
Anusuya MA, Katti SK (2009) Speech recognition by machines: a review. Int J Comput Sci Secur 6(3). http://arxiv.org/ftp/arxiv/papers/1001/1001.2267.pdf. Accessed 14 July 2019
Araki S, Vincent E (2016) https://sisec.inria.fr/sisec-2016/2016-underdetermined-speech-and-music-mixtures/
Araki S, Sawada H, Mukai R, Makino S (2007) Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process 87(8):1833–1847. https://doi.org/10.1016/j.sigpro.2007.02.003
Araki S et al (2012) The 2011 signal separation evaluation campaign (SiSEC2011): audio source separation. In: Theis F, Cichocki A, Yeredor A, Zibulevsky M (eds) Latent variable analysis and signal separation. LVA/ICA 2012. Lecture notes in computer science, vol 7191. Springer, Berlin, Heidelberg
Belouchrani A, Amin MG (1998) Blind source separation based on time–frequency signal representations. IEEE Trans Signal Process 46(11):2888–2897. https://doi.org/10.1109/78.726803
Blin A, Araki S, Makino S (2005) Underdetermined blind separation of convolutive mixtures of speech using time–frequency mask and mixing matrix estimation. IEICE Trans Fundam Electron Commun Comput Sci E88A(7):1693–1700
Bobin J, Rapin J, Larue A, Starck JL (2015) Sparsity and adaptivity for the blind separation of partially correlated sources. IEEE Trans Signal Process 63(5):1199–1213. https://doi.org/10.1109/tsp.2015.2391071
Cermak J, Smekal Z (2009) Underdetermined blind source separation using linear separation system. Lecture notes in computer science. pp 300–305. https://doi.org/10.1007/978-3-642-00525-1_30
Cho J, Choi J, Yoo CD (2011) Underdetermined convolutive blind source separation using a novel mixing matrix estimation and MMSE-based source estimation. In: 2011 IEEE international workshop on machine learning for signal processing. https://doi.org/10.1109/mlsp.2011.6064629
Fevotte C, Godsill SJ (2006) A Bayesian approach for blind separation of sparse sources. IEEE Trans Audio Speech Lang Process 14(6):2174–2188. https://doi.org/10.1109/tsa.2005.858523
Fevotte C, Gribonval R, Vincent E (2005) BSS_EVAL toolbox user guide—revision 2.0 [Technical Report]: 19 inria-00564760
Kim SG, Yoo CD (2009) Underdetermined blind source separation based on subspace representation. IEEE Trans Signal Process 57(7):2604–2614. https://doi.org/10.1109/tsp.2009.2017570
Li Y, Amari S, Cichocki A, Ho DWC, Xie S (2006) Underdetermined blind source separation based on sparse representation. IEEE Trans Signal Process 54(2):423–437. https://doi.org/10.1109/tsp.2005.861743
Reju VG, Koh SN, Soon IY (2010) Underdetermined convolutive blind source separation via time–frequency masking. IEEE Trans Audio Speech Lang Process 18(1):101–116. https://doi.org/10.1109/tasl.2009.2024380
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules proceedings of advances in neural information processing systems 30 (NIPS 2017). https://arxiv.org/pdf/1710.09829.pdf. Accessed 14 July 2019
Sawada H, Mukai R, Araki S, Makino S (2004) A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans Speech Audio Process 12(5):530–538. https://doi.org/10.1109/tsa.2004.832994
Sawada H, Araki S, Makino S (2007) Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS. In: 2007 IEEE international symposium on circuits and systems. https://doi.org/10.1109/iscas.2007.378164
Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527. https://doi.org/10.1109/tasl.2010.2051355
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469. https://doi.org/10.1109/tsa.2005.858005
Winter S, Kellermann W, Sawada H et al (2006) MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l1-norm minimization. EURASIP J Adv Signal Process 2007:024717. https://doi.org/10.1155/2007/24717
Yang L, Lv J, Xiang Y (2013) Underdetermined blind source separation by parallel factor analysis in time–frequency domain. Cogn Comput 5(2):207–214. https://doi.org/10.1007/s12559-012-9177-9
Yilmaz O, Rickard S (2004) Blind separation of speech mixtures via time–frequency masking. IEEE Trans Signal Process 52(7):1830–1847. https://doi.org/10.1109/tsp.2004.828896
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author M. Kumar declares that he has no conflict of interest. Author V. E. Jayanthi declares that she has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, M., Jayanthi, V.E. Underdetermined blind source separation using CapsNet. Soft Comput 24, 9011–9019 (2020). https://doi.org/10.1007/s00500-019-04430-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04430-4