Skip to main content
Log in

Music auto-tagging based on the unified latent semantic modeling

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We proposed a music auto-tagging approach based on the latent space modeling both for music context and content. First, we introduce the latent semantic analysis for music tags with Sparse Nonnegative Matrix Factorization. Then the music contents semantics will be learnt by decomposing the music content into a pre-trained dictionary and an adaptive dictionary learning algorithm is proposed. Finally, the two latent spaces will be associated with a certain subspace mapping algorithm. The experimental results show that our proposed approach outperforms the state-of-the-art auto-tagging systems when applied to the CAL500 dataset in the 5-fold cross-validation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  Google Scholar 

  2. Bertin-Mahieux T, Eck D, Maillet F et al (2008) Autotagger: a model for predicting social tags from acoustic features on large music databases. J New Music Res 37(2):115–135

    Article  Google Scholar 

  3. Coviello E, Chan AB, Lanckriet G (2011) Time series models for semantic music annotation. IEEE Trans Audio Speech Lang Process 19(5):1343–1359

    Article  Google Scholar 

  4. Coviello E, Lanckriet GR, Chan AB (2012) The variational hierarchical EM algorithm for clustering hidden markov models. In: Advances in Neural Information Processing Systems, pp 404–412

  5. David MB, Andrew YN, Michael IJ (2003) Latent Dirichlet allocation. J Mach Learn Res 2003(3):993–1022

    MATH  Google Scholar 

  6. Domingues MA, Gouyon F, Jorge AM et al (2013) Combining usage and content in an online recommendation system for music in the long tail. Int J Multimed Inf Retr 2(1):3–13

    Article  Google Scholar 

  7. Ellis K, Coviello E, Chan AB et al (2013) A bag of systems representation for music auto-tagging. IEEE Trans Audio Speech Lang Process 21(12):2554–2569

    Article  Google Scholar 

  8. Engan K, Aase SO, Husoy JH (1999) Method of optimal directions for frame design. In: Proceedings of 1999 I.E. international conference on acoustics, speech, and signal processing vol. 5, IEEE, pp 2443–2446

  9. He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160

  10. Hoffman MD, Blei DM, Cook PR (2009) Easy as CBA: a simple probabilistic model for tagging music. In: Proceedings of international society for music information retrieval conference pp 369–374

  11. Hoyer P (2004) O.: non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5(Nov):1457–1469

    MathSciNet  MATH  Google Scholar 

  12. Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502

    Article  Google Scholar 

  13. Knees P, Schedl M (2013) A survey of music similarity and recommendation from music context data. ACM Trans Multimed Comput Commun Appl (TOMM) 10(1):2

    Google Scholar 

  14. Lamere P (2008) Social tagging and music information retrieval. J New Music Res 37(2):101–114

    Article  Google Scholar 

  15. Levy M, Schedl M (2008) Learning latent semantic models for music from social tags. J New Music Res 37(2):137–150

    Article  Google Scholar 

  16. Mairal J, Bach F, Ponce J, et al. (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 689–696

  17. Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. In: Proceedings of international society for music information retrieval conference, pp 577–582

  18. Miotto R, Lanckriet G (2012) A generative context model for semantic music annotation and retrieval. IEEE Trans Audio Speech Lang Process 20(4):1096–1108

    Article  Google Scholar 

  19. Nam J, Herrera J, Slaney M, Smith J (2012) Learning sparse feature representations for music annotation and retrieval. In: Proceedings of the international society for music information retrieval conference pp 565–570

  20. Panagakis Y, Kotropoulos C (2012) Automatic music tagging by low rank representation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 497–500

  21. Schedl M, Schnitzer D (2013) Hybrid retrieval approaches to geospatial music recommendation. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 793–796

  22. Schedl M, Gómez E, Goto M (2013) Multimedia information retrieval: music and audio. In: Proceedings of the 21st ACM international conference on multimedia, ACM, pp 1117–1118

  23. Skretting K, Engan K (2010) Recursive least squares dictionary learning algorithm. IEEE Trans Signal Process 58(4):2121–2130

    Article  MathSciNet  Google Scholar 

  24. Tao L, Tzanetakis G (2003) Factors in automatic musical genre classification of audio signals. In: 2003 I.E. workshop on applications of signal processing to audio and acoustics, IEEE, pp 143–146

  25. Turnbull D, Barrington L, Torres D et al (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2):467–476

    Article  Google Scholar 

  26. Weiqing M, Bingkun B, Changsheng X (2014) Multimodal spatio-temporal theme modeling for landmark analysis. IEEE Multimedia 21(3):20–29

    Article  Google Scholar 

  27. Weiqing M, Bing-Kun B, Shuhuan M, Yaohui Z, Yong R, Shuqiang J (2017) You are what you eat: exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia PP(99):1–15

    Google Scholar 

  28. Xie B, Bian W, Tao D, et al. (2011) Music tagging with regularized logistic regression. In: Proceedings of international society for music information retrieval conference, pp 711–716

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China under Grant No. 60902065, No. 61401227, and by Beijing Natural Science Foundation (No.4152053).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Shao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shao, X., Cheng, Z. & Kankanhalli, M.S. Music auto-tagging based on the unified latent semantic modeling. Multimed Tools Appl 78, 161–176 (2019). https://doi.org/10.1007/s11042-018-5632-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5632-2

Keywords

Navigation