A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique

Mavaddati, Samira

doi:10.1007/s00034-019-01338-0

A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique

Published: 08 January 2020

Volume 39, pages 3652–3681, (2020)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Samira Mavaddati ORCID: orcid.org/0000-0002-8138-1014¹

377 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, a new monaural singing voice separation algorithm is presented. This field of signal processing provides important information in many areas dealing with voice recognition, data retrieval, and singer identification. The proposed approach includes a sparse and low-rank decomposition model using spectrogram of the singing voice signals. The vocal and non-vocal parts of a singing voice signal are investigated as sparse and low-rank components, respectively. An alternating optimization algorithm is applied to decompose the singing voice frames using the sparse representation technique over the vocal and non-vocal dictionaries. Also, a novel voice activity detector is presented based upon the energy of the sparse coefficients to learn atoms related to the non-vocal data in the training step. In the test phase, the learned non-vocal atoms of the music instrumental part are updated according to the non-vocal components captured from the test signal using domain adaptation technique. The proposed dictionary learning process includes two coherence measures: atom–data coherence and mutual coherence to provide a learning procedure with low reconstruction error along with a proper separation in the test step. The simulation results using different measures show that the proposed method leads to significantly better results in comparison with the earlier methods in this context and the traditional procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Singing voice separation with pre-learned dictionary and reconstructed voice spectrogram

Article 06 October 2018

Singing Voice Separation Using RPCA with Weighted $$l_{1}$$ -norm

Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria

Notes

https://sites.google.com/site/unvoicedsoundseparation/mir-1k.

References

M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over-complete dictionaries for sparse representation. IEEE Trans. Signal Process. 54, 4311–4322 (2006)
Article Google Scholar
D. Barchiesi, M.D. Plumbley, Learning incoherent dictionaries for sparse approximation using iterative projections and rotations. IEEE Trans. Signal Process. 61, 2055–2065 (2013)
Article Google Scholar
J. Benesty, Springer Handbook of Speech Processing (Springer, Berlin, 2008)
Book Google Scholar
N. Boulanger, G. Mysore, M. Hoffman, Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 7019–7023
E.J. Candes, L. Xiaodong, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58, 1–39 (2011)
Article MathSciNet Google Scholar
P. Chandna, M. Miron, J. Janer, E. Gomez, Monoaural audio source separation using deep convolutional neural networks, in International Conference on Latent Variable Analysis and Signal Separation (2017), pp. 258–266
G. Chen, C. Xiong, J.J. Corso, Dictionary transfer for image denoising via domain adaptation, in Proceedings of IEEE International Conference on Image Processing (2012), pp. 1189–1192
J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
D.L. Donoho, X. Huo, Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory 47, 2845–2862 (2001)
Article MathSciNet Google Scholar
J.L. Durrieu, G. Richard, B. David, C. Fevotte, Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. Audio Speech Lang. Process. 18, 564–575 (2010)
Article Google Scholar
Z.C. Fan, Y.L. Lai, J.S.R. Jang, SVSGAN: singing voice separation via generative adversarial network, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
H. Fujihara, M. Goto, A music information retrieval system based on singing voice timbre, in ISMIR (2007), pp. 467–470
H. Fujihara, M. Goto, J. Ogata, H.G. Okuno, Lyric synchronizer: automatic synchronization system between musical audio signals and lyrics. J. Sel. Top. Signal Process. 5, 1252–1261 (2011)
Article Google Scholar
A. Gray, J. Markel, Distance measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 24, 380–391 (1976)
Article Google Scholar
C.L. Hsu, J.S.R. Jang, On the improvement of singing voice separation for monaural recordings using the MIR-1 K dataset. IEEE Trans. Audio Speech Lang. Process. 18, 310–319 (2010)
Article Google Scholar
P.S. Huang, S.D. Chen, P. Smaragdis, M. Hasegawa, Singing voice separation from monaural recordings using robust principal component analysis, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 57–60
P.S. Huang, M. Kim, M. Johnson, P. Smaragdis, Singing-voice separation from monaural recordings using deep recurrent neural networks, in International Society for Music Information Retrieval Conference (2014)
Y. Ikemiya, K. Itoyama, K. Yoshii, Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation. J. IEEE/ACM TASLP 24, 2084–2095 (2016)
Google Scholar
A. Jansson, E.J. Humphrey, N. Montecchio, R. Bittner, A. Kumar, T. Weyde, Singing voice separation with deep U-Net convolutional networks, in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR) (2017), pp. 745–751
M. Lagrange, A. Ozerov, E. Vincent, Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning, in Proceedings of the ISMIR (2012), pp. 595–560
H. Lee, A. Battle, R. Raina, A.Y. Ng, Efficient sparse coding algorithms, advances in neural information processing systems. Adv. Neural. Inf. Process. Syst. 19, 801–808 (2007)
Google Scholar
Y. Li, D.L. Wang, Singing voice separation from monaural recordings, in Proceedings of the International Conference of Music Information Retrieval (2006), pp. 176–179
P.C. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, London, 2007)
Book Google Scholar
Y. Luo, Z. Chen, D.P.W. Ellis, Deep clustering for singing voice separation, in MIREX, task of Singing Voice Separation (2016), pp. 1–2
Y. Luo, Z. Chen, J.R. Hershey, J.L. Roux, N. Mesgarani, Deep clustering and conventional networks for music separation: Stronger together, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), pp. 61–65
J. Ma, Y. Hu, P.C. Loizou, Objective measures for predicting speech intelligibility in noisy conditions based on new band importance functions. J. Acoust. Soc. Am. 125, 3387–3405 (2009)
Article Google Scholar
S. Mavaddati, A novel singing voice separation method based on sparse non-negative matrix factorization and low-rank modeling. Iran. J. Electr. Electron. Eng. 15, 1–17 (2019)
Google Scholar
S. Mavaddaty, S.M. Ahadi, S. Seyedin, A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation. Speech Commun. 76, 42–60 (2016)
Article Google Scholar
S. Mavaddaty, S.M. Ahadi, S. Seyedin, Modified coherence-based dictionary learning method for speech enhancement. Signal Process. IET 9, 537–545 (2015)
Article Google Scholar
S. Mavaddaty, S.M. Ahadi, S. Seyedin, Speech enhancement using sparse dictionary learning in wavelet packet transform domain. Comput. Speech Lang. 44, 22–47 (2017)
Article Google Scholar
A.R. Nerkar, M.A. Joshi, Singing-voice separation from monaural recordings using empirical wavelet transform, in International Conference on Advanced Communication Control and Computing Technologies (2016), pp. 795–800
B.A. Olshausen, D.J. Field, Sparse coding with an overcomplete basis set: a strategy employed by V1. Vis. Res. 37, 3311–3325 (1997)
Article Google Scholar
A. Ozerov, P. Philippe, F. Bimbot, R. Gribonval, Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Trans. Audio Speech Lang. Process. 15, 1564–1578 (2007)
Article Google Scholar
Z. Rafii, B. Pardo, A simple music/voice separation method based on the extraction of the repeating musical structure, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 221–224
Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21, 73–84 (2013)
Article Google Scholar
A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in Proceedings of International Conference on Acoustics, Speech, Signal Processing (2001), pp. 749–752
D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. (Chapman & Hall/CRC, Boca Raton, 2000)
MATH Google Scholar
C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Acoust. Speech Signal Process. 20, 1698–1712 (2012)
Google Scholar
P. Sprechmann, A. Bronstein, G. Sapiro, Real-time online singing voice separation from monaural recordings using robust low-rank modeling, in Proceedings of the 13th International Society for Music Information Retrieval Conference (2012), pp. 67–72
P. Teng, Y. Jia, Voice activity detection via noise reducing using non-negative sparse coding. IEEE Signal Process. Lett. 20, 475–478 (2013)
Article Google Scholar
E. Vincent, R. Gribonval, C. Fevotte, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14, 1462–1469 (2006)
Article Google Scholar
Y.H. Yang, Low-rank representation of both singing voice and music accompaniment via learned dictionaries, in Proceedings of the 14th International Society for Music Information Retrieval Conference (2013), pp. 427–432
Y.H. Yang, On sparse and low-rank matrix decomposition for singing voice separation, in ACM Multimedia (2012), pp. 757–760
L. Yipeng, W. DeLiang, Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. Audio Speech Lang. Process. 15, 1475–1487 (2007)
Article Google Scholar
D.T. You, J.Q. Han, G.B. Zheng, T.R. Zheng, Sparse power spectrum based robust voice activity detector, in IEEE International Conference on Acoustics, Speech, and Signal Processing (2012), pp. 289–292
D.T. You, J.Q. Han, G.B. Zheng, T.R. Zheng, J. Li, Sparse representation with optimized learned dictionary for robust voice activity detection. Circuits Syst. Signal Process. 33, 2267–2291 (2014)
Article Google Scholar

Download references

Acknowledgements

The author wishes to thank Professor P. Loizou for making the source codes of the fwSegSNR and PESQ for the objective quality evaluations publicly available. The author also thanks Christian D. Sigg for publishing the MATLAB implementations of the LARC algorithm.

Author information

Authors and Affiliations

Electronic Department, Faculty of Technology and Engineering, University of Mazandaran, Babolsar, Iran
Samira Mavaddati

Authors

Samira Mavaddati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samira Mavaddati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mavaddati, S. A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique. Circuits Syst Signal Process 39, 3652–3681 (2020). https://doi.org/10.1007/s00034-019-01338-0

Download citation

Received: 27 January 2019
Revised: 25 December 2019
Accepted: 27 December 2019
Published: 08 January 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00034-019-01338-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique

Abstract

Access this article

Similar content being viewed by others

Singing voice separation with pre-learned dictionary and reconstructed voice spectrogram

Singing Voice Separation Using RPCA with Weighted $$l_{1}$$ -norm

Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique

Abstract

Access this article

Similar content being viewed by others

Singing voice separation with pre-learned dictionary and reconstructed voice spectrogram

Singing Voice Separation Using RPCA with Weighted $$l_{1}$$ -norm

Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation