Abstract
We proposed a music auto-tagging approach based on the latent space modeling both for music context and content. First, we introduce the latent semantic analysis for music tags with Sparse Nonnegative Matrix Factorization. Then the music contents semantics will be learnt by decomposing the music content into a pre-trained dictionary and an adaptive dictionary learning algorithm is proposed. Finally, the two latent spaces will be associated with a certain subspace mapping algorithm. The experimental results show that our proposed approach outperforms the state-of-the-art auto-tagging systems when applied to the CAL500 dataset in the 5-fold cross-validation experiments.






Similar content being viewed by others
References
Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Bertin-Mahieux T, Eck D, Maillet F et al (2008) Autotagger: a model for predicting social tags from acoustic features on large music databases. J New Music Res 37(2):115–135
Coviello E, Chan AB, Lanckriet G (2011) Time series models for semantic music annotation. IEEE Trans Audio Speech Lang Process 19(5):1343–1359
Coviello E, Lanckriet GR, Chan AB (2012) The variational hierarchical EM algorithm for clustering hidden markov models. In: Advances in Neural Information Processing Systems, pp 404–412
David MB, Andrew YN, Michael IJ (2003) Latent Dirichlet allocation. J Mach Learn Res 2003(3):993–1022
Domingues MA, Gouyon F, Jorge AM et al (2013) Combining usage and content in an online recommendation system for music in the long tail. Int J Multimed Inf Retr 2(1):3–13
Ellis K, Coviello E, Chan AB et al (2013) A bag of systems representation for music auto-tagging. IEEE Trans Audio Speech Lang Process 21(12):2554–2569
Engan K, Aase SO, Husoy JH (1999) Method of optimal directions for frame design. In: Proceedings of 1999 I.E. international conference on acoustics, speech, and signal processing vol. 5, IEEE, pp 2443–2446
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160
Hoffman MD, Blei DM, Cook PR (2009) Easy as CBA: a simple probabilistic model for tagging music. In: Proceedings of international society for music information retrieval conference pp 369–374
Hoyer P (2004) O.: non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5(Nov):1457–1469
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Knees P, Schedl M (2013) A survey of music similarity and recommendation from music context data. ACM Trans Multimed Comput Commun Appl (TOMM) 10(1):2
Lamere P (2008) Social tagging and music information retrieval. J New Music Res 37(2):101–114
Levy M, Schedl M (2008) Learning latent semantic models for music from social tags. J New Music Res 37(2):137–150
Mairal J, Bach F, Ponce J, et al. (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 689–696
Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. In: Proceedings of international society for music information retrieval conference, pp 577–582
Miotto R, Lanckriet G (2012) A generative context model for semantic music annotation and retrieval. IEEE Trans Audio Speech Lang Process 20(4):1096–1108
Nam J, Herrera J, Slaney M, Smith J (2012) Learning sparse feature representations for music annotation and retrieval. In: Proceedings of the international society for music information retrieval conference pp 565–570
Panagakis Y, Kotropoulos C (2012) Automatic music tagging by low rank representation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 497–500
Schedl M, Schnitzer D (2013) Hybrid retrieval approaches to geospatial music recommendation. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 793–796
Schedl M, Gómez E, Goto M (2013) Multimedia information retrieval: music and audio. In: Proceedings of the 21st ACM international conference on multimedia, ACM, pp 1117–1118
Skretting K, Engan K (2010) Recursive least squares dictionary learning algorithm. IEEE Trans Signal Process 58(4):2121–2130
Tao L, Tzanetakis G (2003) Factors in automatic musical genre classification of audio signals. In: 2003 I.E. workshop on applications of signal processing to audio and acoustics, IEEE, pp 143–146
Turnbull D, Barrington L, Torres D et al (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2):467–476
Weiqing M, Bingkun B, Changsheng X (2014) Multimodal spatio-temporal theme modeling for landmark analysis. IEEE Multimedia 21(3):20–29
Weiqing M, Bing-Kun B, Shuhuan M, Yaohui Z, Yong R, Shuqiang J (2017) You are what you eat: exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia PP(99):1–15
Xie B, Bian W, Tao D, et al. (2011) Music tagging with regularized logistic regression. In: Proceedings of international society for music information retrieval conference, pp 711–716
Acknowledgments
This work is supported by the National Nature Science Foundation of China under Grant No. 60902065, No. 61401227, and by Beijing Natural Science Foundation (No.4152053).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shao, X., Cheng, Z. & Kankanhalli, M.S. Music auto-tagging based on the unified latent semantic modeling. Multimed Tools Appl 78, 161–176 (2019). https://doi.org/10.1007/s11042-018-5632-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5632-2