Abstract
In this paper, we present a multichannel nonnegative matrix factorization (MNMF) system for the task of source separation. We propose a novel signal model using spatial covariance matrices (SCM) where the mixing filter encodes the spatial information and the source variances are modeled using a NMF structure. Moreover, the proposed model is initialized with the estimated source direction of arrival (DoA) in order to mitigate the strong sensitivity to parameter initialization. The proposed system has been evaluated for the task of music source separation using a multichannel classical chamber music dataset showing that it is possible to reach real time in the tested scenarios by combining multi-core architectures with parallel and high-performance techniques.
Similar content being viewed by others
Notes
RT\(_{60}\) is the time required for reflections of a direct sound to decay by 60 dB below the level of the direct sound.
References
Campbell DR, Palomaki KJ, Brown G (2005) A MATLAB simulation of “shoebox’’ room acoustics for use in research and teaching. Comput Inf Syst 9:48–51
Canadas-Quesada F, Fitzgerald D, Vera-Candeas P, Ruiz-Reyes N (2017) Harmonic-percussive sound separation using rhythmic information from non-negative matrix factorization in single-channel music recordings. DAFx 2017 - Proceedings of the 20th International Conference on Digital Audio Effects (i), 276–282
Carabias-Orti JJ, Nikunen J, Virtanen T, Vera-Candeas P (2018) Multichannel blind Sound source separation using spatial covariance model With level and time Differences and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 26(9):1512–1527. https://doi.org/10.1109/TASLP.2018.2830105
Défossez A, Bach F, Usunier N, Bottou L (2019) Music source separation in the waveform domain (2019)
Durrieu JL, Richard G, David B, Fevotte C (2010) Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans Audio Speech Lang Process 18(3):564–575. https://doi.org/10.1109/TASL.2010.2041114
Ewert S, Muller M (2011) Estimating note intensities in music recordings. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 385–388. IEEE. https://doi.org/10.1109/ICASSP.2011.5946421
Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput 21(3):793–830. https://doi.org/10.1162/neco.2008.04-08-771
Herre J, Falch C, Mahne D, Del Galdo G, Kallinger M, Thiergart O (2010) Interactive teleconferencing combining spatial Audio Object Coding and DirAC technology. In: 128th Audio Engineering Society Convention 2010, vol. 3, pp. 1579–1590
Huang PS, Chen SD, Smaragdis P, Hasegawa-Johnson M (2012) Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 57–60
Ito N, Nakatani T (2019) FastMNMF: Joint Diagonalization Based Accelerated Algorithms for Multichannel Nonnegative Matrix Factorization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2019.8682291
Itoyama K, Goto M, Komatani K, Ogata T, Okuno HG (2008) Instrument equalizer for query-by-example retrieval: improving sound source separation based on Integrated harmonic and Inharmonic Models. Ismir. https://doi.org/10.1136/bmj.324.7341.827
Jensen JR, Christensen MG, Jensen SH (2013) Nonlinear least squares methods for joint DOA and pitch estimation. IEEE Trans audio Speech Lang Process 21(5):923–933. https://doi.org/10.1109/TASL.2013.2239290
Kitamura D, Ono N, Sawada H, Kameoka H, Saruwatari H (2016) Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 24(9):1626–1641. https://doi.org/10.1109/TASLP.2016.2577880
Li B, Liu X, Dinesh K, Duan Z, Sharma G (2019) Creating a multitrack classical music performance dataset for multimodal music analysis: challenges, insights, and applications. IEEE Trans Multimedia 21(2):522–535. https://doi.org/10.1109/TMM.2018.2856090
Liutkus A, Durrieu JL, Daudet L, Richard G (2013) An overview of informed audio source separation. In: 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp. 1–4. IEEE. https://doi.org/10.1109/WIAMIS.2013.6616139
Marro C, Mahieux Y, Simmer K (1998) Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering. IEEE Trans Speech Audio Process 6(3):240–259. https://doi.org/10.1109/89.668818
McDonough J, Kumatani K (2012) Microphone Arrays. Techniques for Noise Robustness in Automatic Speech Recognition. Wiley, Chichester, UK, pp 109–157. https://doi.org/10.1002/9781118392683.ch6
Merimaa J, Pulkki V (2005) Spatial impulse response rendering I: analysis and synthesis. AES J Audio Eng Soc 53(12):1115–1127
Mitsufuji Y, Roebel A (2013) Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 71–75. IEEE. https://doi.org/10.1109/ICASSP.2013.6637611
Mitsufuji Y, Uhlich S, Takamune N, Kitamura D, Koyama S, Saruwatari H (2020) Multichannel non-negative matrix factorization using nanded spatial covariance matrices in wavenumber domain. IEEE/ACM Trans Audio Speech Lang Process 28:49–60. https://doi.org/10.1109/TASLP.2019.2948770
Munoz-Montoro AJ, Politis A, Drossos K, Carabias-Orti JJ (2020) Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF. In: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. IEEE. https://doi.org/10.1109/MMSP48831.2020.9287068
Nikunen J, Virtanen T (2014) Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans Audio Speech Lang Process 22(3):727–739. https://doi.org/10.1109/TASLP.2014.2303576
Nikunen J, Virtanen T (2014) Multichannel audio separation by direction of arrival based spatial covariance model and non-negative matrix factorization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 6677–6681. IEEE. https://doi.org/10.1109/ICASSP.2014.6854892
Nugraha AA, Liutkus A, Vincent E (2016) Multichannel audio source separation with deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 24(9):1652–1664. https://doi.org/10.1109/TASLP.2016.2580946
Pulkki V (2007) Spatial sound reproduction with directional audio coding. AES: J Audio Eng Soc 55(6):503–516
Sawada H, Kameoka H, Araki S, Ueda N (2013) Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans Audio Speech Lang Process 21(5):971–982. https://doi.org/10.1109/TASL.2013.2239990
Sekiguchi K, Bando Y, Nugraha AA, Yoshii K, Kawahara T (2020) Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation. IEEE/ACM Trans Audio Speech Lang Process 28:2610–2625. https://doi.org/10.1109/TASLP.2020.3019181
Sekiguchi K, Nugraha AA, Bando Y, Yoshii K (2019) Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE. https://doi.org/10.23919/EUSIPCO.2019.8902557
Smaragdis P (2012) Extraction of Speech from mixture signals. Techniques for noise robustness in automatic speech recognition. Wiley, Chichester, UK, pp 87–108. https://doi.org/10.1002/9781118392683.ch5
Tashev IJ (2009) Sound capture and processing. Wiley, Chichester, UK. https://doi.org/10.1002/9780470994443
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469. https://doi.org/10.1109/TSA.2005.858005
Wang L, Ding H, Yin F (2010) Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J Audio Speech Process 2010(1):1–13. https://doi.org/10.1155/2010/797962
Acknowledgements
This work was supported by the Regional Ministry of the Principality of Asturias under grant FC-GRUPIN-IDI/2018/000226, by the Ministry of Economy, Knowledge and University of the Government of the “Junta de Andalucía” under project P18-RT-1994, by the “Programa Operativo FEDER Andalucía 2014-2020” under project with reference 1257914, and by Pre-doctoral Fellowship Program from the “Ministerio de Ciencia, Innovación y Universidades” of Spain under the reference BES-2016-078512.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muñoz-Montoro, A.J., Carabias-Orti, J.J., Cortina, R. et al. Parallel multichannel blind source separation using a spatial covariance model and nonnegative matrix factorization. J Supercomput 77, 12143–12156 (2021). https://doi.org/10.1007/s11227-021-03771-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03771-y