Skip to main content
Log in

Parallel multichannel blind source separation using a spatial covariance model and nonnegative matrix factorization

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we present a multichannel nonnegative matrix factorization (MNMF) system for the task of source separation. We propose a novel signal model using spatial covariance matrices (SCM) where the mixing filter encodes the spatial information and the source variances are modeled using a NMF structure. Moreover, the proposed model is initialized with the estimated source direction of arrival (DoA) in order to mitigate the strong sensitivity to parameter initialization. The proposed system has been evaluated for the task of music source separation using a multichannel classical chamber music dataset showing that it is possible to reach real time in the tested scenarios by combining multi-core architectures with parallel and high-performance techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. RT\(_{60}\) is the time required for reflections of a direct sound to decay by 60 dB below the level of the direct sound.

  2. https://www.openblas.net

  3. http://www.fftw.org

References

  1. Campbell DR, Palomaki KJ, Brown G (2005) A MATLAB simulation of “shoebox’’ room acoustics for use in research and teaching. Comput Inf Syst 9:48–51

    Google Scholar 

  2. Canadas-Quesada F, Fitzgerald D, Vera-Candeas P, Ruiz-Reyes N (2017) Harmonic-percussive sound separation using rhythmic information from non-negative matrix factorization in single-channel music recordings. DAFx 2017 - Proceedings of the 20th International Conference on Digital Audio Effects (i), 276–282

  3. Carabias-Orti JJ, Nikunen J, Virtanen T, Vera-Candeas P (2018) Multichannel blind Sound source separation using spatial covariance model With level and time Differences and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 26(9):1512–1527. https://doi.org/10.1109/TASLP.2018.2830105

    Article  Google Scholar 

  4. Défossez A, Bach F, Usunier N, Bottou L (2019) Music source separation in the waveform domain (2019)

  5. Durrieu JL, Richard G, David B, Fevotte C (2010) Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans Audio Speech Lang Process 18(3):564–575. https://doi.org/10.1109/TASL.2010.2041114

    Article  Google Scholar 

  6. Ewert S, Muller M (2011) Estimating note intensities in music recordings. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 385–388. IEEE. https://doi.org/10.1109/ICASSP.2011.5946421

  7. Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput 21(3):793–830. https://doi.org/10.1162/neco.2008.04-08-771

    Article  MATH  Google Scholar 

  8. Herre J, Falch C, Mahne D, Del Galdo G, Kallinger M, Thiergart O (2010) Interactive teleconferencing combining spatial Audio Object Coding and DirAC technology. In: 128th Audio Engineering Society Convention 2010, vol. 3, pp. 1579–1590

  9. Huang PS, Chen SD, Smaragdis P, Hasegawa-Johnson M (2012) Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 57–60

  10. Ito N, Nakatani T (2019) FastMNMF: Joint Diagonalization Based Accelerated Algorithms for Multichannel Nonnegative Matrix Factorization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2019.8682291

  11. Itoyama K, Goto M, Komatani K, Ogata T, Okuno HG (2008) Instrument equalizer for query-by-example retrieval: improving sound source separation based on Integrated harmonic and Inharmonic Models. Ismir. https://doi.org/10.1136/bmj.324.7341.827

    Article  Google Scholar 

  12. Jensen JR, Christensen MG, Jensen SH (2013) Nonlinear least squares methods for joint DOA and pitch estimation. IEEE Trans audio Speech Lang Process 21(5):923–933. https://doi.org/10.1109/TASL.2013.2239290

    Article  Google Scholar 

  13. Kitamura D, Ono N, Sawada H, Kameoka H, Saruwatari H (2016) Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Lang Process 24(9):1626–1641. https://doi.org/10.1109/TASLP.2016.2577880

    Article  Google Scholar 

  14. Li B, Liu X, Dinesh K, Duan Z, Sharma G (2019) Creating a multitrack classical music performance dataset for multimodal music analysis: challenges, insights, and applications. IEEE Trans Multimedia 21(2):522–535. https://doi.org/10.1109/TMM.2018.2856090

    Article  Google Scholar 

  15. Liutkus A, Durrieu JL, Daudet L, Richard G (2013) An overview of informed audio source separation. In: 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp. 1–4. IEEE. https://doi.org/10.1109/WIAMIS.2013.6616139

  16. Marro C, Mahieux Y, Simmer K (1998) Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering. IEEE Trans Speech Audio Process 6(3):240–259. https://doi.org/10.1109/89.668818

    Article  Google Scholar 

  17. McDonough J, Kumatani K (2012) Microphone Arrays. Techniques for Noise Robustness in Automatic Speech Recognition. Wiley, Chichester, UK, pp 109–157. https://doi.org/10.1002/9781118392683.ch6

    Chapter  Google Scholar 

  18. Merimaa J, Pulkki V (2005) Spatial impulse response rendering I: analysis and synthesis. AES J Audio Eng Soc 53(12):1115–1127

    Google Scholar 

  19. Mitsufuji Y, Roebel A (2013) Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 71–75. IEEE. https://doi.org/10.1109/ICASSP.2013.6637611

  20. Mitsufuji Y, Uhlich S, Takamune N, Kitamura D, Koyama S, Saruwatari H (2020) Multichannel non-negative matrix factorization using nanded spatial covariance matrices in wavenumber domain. IEEE/ACM Trans Audio Speech Lang Process 28:49–60. https://doi.org/10.1109/TASLP.2019.2948770

    Article  Google Scholar 

  21. Munoz-Montoro AJ, Politis A, Drossos K, Carabias-Orti JJ (2020) Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF. In: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. IEEE. https://doi.org/10.1109/MMSP48831.2020.9287068

  22. Nikunen J, Virtanen T (2014) Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans Audio Speech Lang Process 22(3):727–739. https://doi.org/10.1109/TASLP.2014.2303576

    Article  Google Scholar 

  23. Nikunen J, Virtanen T (2014) Multichannel audio separation by direction of arrival based spatial covariance model and non-negative matrix factorization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 6677–6681. IEEE. https://doi.org/10.1109/ICASSP.2014.6854892

  24. Nugraha AA, Liutkus A, Vincent E (2016) Multichannel audio source separation with deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 24(9):1652–1664. https://doi.org/10.1109/TASLP.2016.2580946

    Article  Google Scholar 

  25. Pulkki V (2007) Spatial sound reproduction with directional audio coding. AES: J Audio Eng Soc 55(6):503–516

    Google Scholar 

  26. Sawada H, Kameoka H, Araki S, Ueda N (2013) Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans Audio Speech Lang Process 21(5):971–982. https://doi.org/10.1109/TASL.2013.2239990

    Article  Google Scholar 

  27. Sekiguchi K, Bando Y, Nugraha AA, Yoshii K, Kawahara T (2020) Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation. IEEE/ACM Trans Audio Speech Lang Process 28:2610–2625. https://doi.org/10.1109/TASLP.2020.3019181

    Article  Google Scholar 

  28. Sekiguchi K, Nugraha AA, Bando Y, Yoshii K (2019) Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE. https://doi.org/10.23919/EUSIPCO.2019.8902557

  29. Smaragdis P (2012) Extraction of Speech from mixture signals. Techniques for noise robustness in automatic speech recognition. Wiley, Chichester, UK, pp 87–108. https://doi.org/10.1002/9781118392683.ch5

    Chapter  Google Scholar 

  30. Tashev IJ (2009) Sound capture and processing. Wiley, Chichester, UK. https://doi.org/10.1002/9780470994443

    Book  Google Scholar 

  31. Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469. https://doi.org/10.1109/TSA.2005.858005

    Article  Google Scholar 

  32. Wang L, Ding H, Yin F (2010) Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J Audio Speech Process 2010(1):1–13. https://doi.org/10.1155/2010/797962

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Regional Ministry of the Principality of Asturias under grant FC-GRUPIN-IDI/2018/000226, by the Ministry of Economy, Knowledge and University of the Government of the “Junta de Andalucía” under project P18-RT-1994, by the “Programa Operativo FEDER Andalucía 2014-2020” under project with reference 1257914, and by Pre-doctoral Fellowship Program from the “Ministerio de Ciencia, Innovación y Universidades” of Spain under the reference BES-2016-078512.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. J. Muñoz-Montoro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muñoz-Montoro, A.J., Carabias-Orti, J.J., Cortina, R. et al. Parallel multichannel blind source separation using a spatial covariance model and nonnegative matrix factorization. J Supercomput 77, 12143–12156 (2021). https://doi.org/10.1007/s11227-021-03771-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03771-y

Keywords

Navigation