Skip to main content
Log in

A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In this paper, a new monaural singing voice separation algorithm is presented. This field of signal processing provides important information in many areas dealing with voice recognition, data retrieval, and singer identification. The proposed approach includes a sparse and low-rank decomposition model using spectrogram of the singing voice signals. The vocal and non-vocal parts of a singing voice signal are investigated as sparse and low-rank components, respectively. An alternating optimization algorithm is applied to decompose the singing voice frames using the sparse representation technique over the vocal and non-vocal dictionaries. Also, a novel voice activity detector is presented based upon the energy of the sparse coefficients to learn atoms related to the non-vocal data in the training step. In the test phase, the learned non-vocal atoms of the music instrumental part are updated according to the non-vocal components captured from the test signal using domain adaptation technique. The proposed dictionary learning process includes two coherence measures: atom–data coherence and mutual coherence to provide a learning procedure with low reconstruction error along with a proper separation in the test step. The simulation results using different measures show that the proposed method leads to significantly better results in comparison with the earlier methods in this context and the traditional procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://sites.google.com/site/unvoicedsoundseparation/mir-1k.

References

  1. M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over-complete dictionaries for sparse representation. IEEE Trans. Signal Process. 54, 4311–4322 (2006)

    Article  Google Scholar 

  2. D. Barchiesi, M.D. Plumbley, Learning incoherent dictionaries for sparse approximation using iterative projections and rotations. IEEE Trans. Signal Process. 61, 2055–2065 (2013)

    Article  Google Scholar 

  3. J. Benesty, Springer Handbook of Speech Processing (Springer, Berlin, 2008)

    Book  Google Scholar 

  4. N. Boulanger, G. Mysore, M. Hoffman, Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 7019–7023

  5. E.J. Candes, L. Xiaodong, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58, 1–39 (2011)

    Article  MathSciNet  Google Scholar 

  6. P. Chandna, M. Miron, J. Janer, E. Gomez, Monoaural audio source separation using deep convolutional neural networks, in International Conference on Latent Variable Analysis and Signal Separation (2017), pp. 258–266

  7. G. Chen, C. Xiong, J.J. Corso, Dictionary transfer for image denoising via domain adaptation, in Proceedings of IEEE International Conference on Image Processing (2012), pp. 1189–1192

  8. J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  9. D.L. Donoho, X. Huo, Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory 47, 2845–2862 (2001)

    Article  MathSciNet  Google Scholar 

  10. J.L. Durrieu, G. Richard, B. David, C. Fevotte, Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. Audio Speech Lang. Process. 18, 564–575 (2010)

    Article  Google Scholar 

  11. Z.C. Fan, Y.L. Lai, J.S.R. Jang, SVSGAN: singing voice separation via generative adversarial network, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)

  12. H. Fujihara, M. Goto, A music information retrieval system based on singing voice timbre, in ISMIR (2007), pp. 467–470

  13. H. Fujihara, M. Goto, J. Ogata, H.G. Okuno, Lyric synchronizer: automatic synchronization system between musical audio signals and lyrics. J. Sel. Top. Signal Process. 5, 1252–1261 (2011)

    Article  Google Scholar 

  14. A. Gray, J. Markel, Distance measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 24, 380–391 (1976)

    Article  Google Scholar 

  15. C.L. Hsu, J.S.R. Jang, On the improvement of singing voice separation for monaural recordings using the MIR-1 K dataset. IEEE Trans. Audio Speech Lang. Process. 18, 310–319 (2010)

    Article  Google Scholar 

  16. P.S. Huang, S.D. Chen, P. Smaragdis, M. Hasegawa, Singing voice separation from monaural recordings using robust principal component analysis, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 57–60

  17. P.S. Huang, M. Kim, M. Johnson, P. Smaragdis, Singing-voice separation from monaural recordings using deep recurrent neural networks, in International Society for Music Information Retrieval Conference (2014)

  18. Y. Ikemiya, K. Itoyama, K. Yoshii, Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation. J. IEEE/ACM TASLP 24, 2084–2095 (2016)

    Google Scholar 

  19. A. Jansson, E.J. Humphrey, N. Montecchio, R. Bittner, A. Kumar, T. Weyde, Singing voice separation with deep U-Net convolutional networks, in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR) (2017), pp. 745–751

  20. M. Lagrange, A. Ozerov, E. Vincent, Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning, in Proceedings of the ISMIR (2012), pp. 595–560

  21. H. Lee, A. Battle, R. Raina, A.Y. Ng, Efficient sparse coding algorithms, advances in neural information processing systems. Adv. Neural. Inf. Process. Syst. 19, 801–808 (2007)

    Google Scholar 

  22. Y. Li, D.L. Wang, Singing voice separation from monaural recordings, in Proceedings of the International Conference of Music Information Retrieval (2006), pp. 176–179

  23. P.C. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, London, 2007)

    Book  Google Scholar 

  24. Y. Luo, Z. Chen, D.P.W. Ellis, Deep clustering for singing voice separation, in MIREX, task of Singing Voice Separation (2016), pp. 1–2

  25. Y. Luo, Z. Chen, J.R. Hershey, J.L. Roux, N. Mesgarani, Deep clustering and conventional networks for music separation: Stronger together, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), pp. 61–65

  26. J. Ma, Y. Hu, P.C. Loizou, Objective measures for predicting speech intelligibility in noisy conditions based on new band importance functions. J. Acoust. Soc. Am. 125, 3387–3405 (2009)

    Article  Google Scholar 

  27. S. Mavaddati, A novel singing voice separation method based on sparse non-negative matrix factorization and low-rank modeling. Iran. J. Electr. Electron. Eng. 15, 1–17 (2019)

    Google Scholar 

  28. S. Mavaddaty, S.M. Ahadi, S. Seyedin, A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation. Speech Commun. 76, 42–60 (2016)

    Article  Google Scholar 

  29. S. Mavaddaty, S.M. Ahadi, S. Seyedin, Modified coherence-based dictionary learning method for speech enhancement. Signal Process. IET 9, 537–545 (2015)

    Article  Google Scholar 

  30. S. Mavaddaty, S.M. Ahadi, S. Seyedin, Speech enhancement using sparse dictionary learning in wavelet packet transform domain. Comput. Speech Lang. 44, 22–47 (2017)

    Article  Google Scholar 

  31. A.R. Nerkar, M.A. Joshi, Singing-voice separation from monaural recordings using empirical wavelet transform, in International Conference on Advanced Communication Control and Computing Technologies (2016), pp. 795–800

  32. B.A. Olshausen, D.J. Field, Sparse coding with an overcomplete basis set: a strategy employed by V1. Vis. Res. 37, 3311–3325 (1997)

    Article  Google Scholar 

  33. A. Ozerov, P. Philippe, F. Bimbot, R. Gribonval, Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Trans. Audio Speech Lang. Process. 15, 1564–1578 (2007)

    Article  Google Scholar 

  34. Z. Rafii, B. Pardo, A simple music/voice separation method based on the extraction of the repeating musical structure, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 221–224

  35. Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21, 73–84 (2013)

    Article  Google Scholar 

  36. A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in Proceedings of International Conference on Acoustics, Speech, Signal Processing (2001), pp. 749–752

  37. D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. (Chapman & Hall/CRC, Boca Raton, 2000)

    MATH  Google Scholar 

  38. C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Acoust. Speech Signal Process. 20, 1698–1712 (2012)

    Google Scholar 

  39. P. Sprechmann, A. Bronstein, G. Sapiro, Real-time online singing voice separation from monaural recordings using robust low-rank modeling, in Proceedings of the 13th International Society for Music Information Retrieval Conference (2012), pp. 67–72

  40. P. Teng, Y. Jia, Voice activity detection via noise reducing using non-negative sparse coding. IEEE Signal Process. Lett. 20, 475–478 (2013)

    Article  Google Scholar 

  41. E. Vincent, R. Gribonval, C. Fevotte, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14, 1462–1469 (2006)

    Article  Google Scholar 

  42. Y.H. Yang, Low-rank representation of both singing voice and music accompaniment via learned dictionaries, in Proceedings of the 14th International Society for Music Information Retrieval Conference (2013), pp. 427–432

  43. Y.H. Yang, On sparse and low-rank matrix decomposition for singing voice separation, in ACM Multimedia (2012), pp. 757–760

  44. L. Yipeng, W. DeLiang, Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. Audio Speech Lang. Process. 15, 1475–1487 (2007)

    Article  Google Scholar 

  45. D.T. You, J.Q. Han, G.B. Zheng, T.R. Zheng, Sparse power spectrum based robust voice activity detector, in IEEE International Conference on Acoustics, Speech, and Signal Processing (2012), pp. 289–292

  46. D.T. You, J.Q. Han, G.B. Zheng, T.R. Zheng, J. Li, Sparse representation with optimized learned dictionary for robust voice activity detection. Circuits Syst. Signal Process. 33, 2267–2291 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

The author wishes to thank Professor P. Loizou for making the source codes of the fwSegSNR and PESQ for the objective quality evaluations publicly available. The author also thanks Christian D. Sigg for publishing the MATLAB implementations of the LARC algorithm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Mavaddati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mavaddati, S. A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique. Circuits Syst Signal Process 39, 3652–3681 (2020). https://doi.org/10.1007/s00034-019-01338-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01338-0

Keywords

Navigation