Music auto-tagging based on the unified latent semantic modeling

Shao, Xi; Cheng, Zhiyong; Kankanhalli, Mohan S.

doi:10.1007/s11042-018-5632-2

Music auto-tagging based on the unified latent semantic modeling

Published: 20 January 2018

Volume 78, pages 161–176, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xi Shao^1,2,
Zhiyong Cheng² &
Mohan S. Kankanhalli²

418 Accesses
Explore all metrics

Abstract

We proposed a music auto-tagging approach based on the latent space modeling both for music context and content. First, we introduce the latent semantic analysis for music tags with Sparse Nonnegative Matrix Factorization. Then the music contents semantics will be learnt by decomposing the music content into a pre-trained dictionary and an adaptive dictionary learning algorithm is proposed. Finally, the two latent spaces will be associated with a certain subspace mapping algorithm. The experimental results show that our proposed approach outperforms the state-of-the-art auto-tagging systems when applied to the CAL500 dataset in the 5-fold cross-validation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective music searching approach based on tag combination by exploiting prototypical acoustic content

Article 04 May 2016

Music Recommendation: Audio Neighbourhoods to Discover Music in the Long Tail

WikiMuTe: A Web-Sourced Dataset of Semantic Descriptions for Music Audio

References

Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Article Google Scholar
Bertin-Mahieux T, Eck D, Maillet F et al (2008) Autotagger: a model for predicting social tags from acoustic features on large music databases. J New Music Res 37(2):115–135
Article Google Scholar
Coviello E, Chan AB, Lanckriet G (2011) Time series models for semantic music annotation. IEEE Trans Audio Speech Lang Process 19(5):1343–1359
Article Google Scholar
Coviello E, Lanckriet GR, Chan AB (2012) The variational hierarchical EM algorithm for clustering hidden markov models. In: Advances in Neural Information Processing Systems, pp 404–412
David MB, Andrew YN, Michael IJ (2003) Latent Dirichlet allocation. J Mach Learn Res 2003(3):993–1022
MATH Google Scholar
Domingues MA, Gouyon F, Jorge AM et al (2013) Combining usage and content in an online recommendation system for music in the long tail. Int J Multimed Inf Retr 2(1):3–13
Article Google Scholar
Ellis K, Coviello E, Chan AB et al (2013) A bag of systems representation for music auto-tagging. IEEE Trans Audio Speech Lang Process 21(12):2554–2569
Article Google Scholar
Engan K, Aase SO, Husoy JH (1999) Method of optimal directions for frame design. In: Proceedings of 1999 I.E. international conference on acoustics, speech, and signal processing vol. 5, IEEE, pp 2443–2446
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160
Hoffman MD, Blei DM, Cook PR (2009) Easy as CBA: a simple probabilistic model for tagging music. In: Proceedings of international society for music information retrieval conference pp 369–374
Hoyer P (2004) O.: non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5(Nov):1457–1469
MathSciNet MATH Google Scholar
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Article Google Scholar
Knees P, Schedl M (2013) A survey of music similarity and recommendation from music context data. ACM Trans Multimed Comput Commun Appl (TOMM) 10(1):2
Google Scholar
Lamere P (2008) Social tagging and music information retrieval. J New Music Res 37(2):101–114
Article Google Scholar
Levy M, Schedl M (2008) Learning latent semantic models for music from social tags. J New Music Res 37(2):137–150
Article Google Scholar
Mairal J, Bach F, Ponce J, et al. (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 689–696
Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. In: Proceedings of international society for music information retrieval conference, pp 577–582
Miotto R, Lanckriet G (2012) A generative context model for semantic music annotation and retrieval. IEEE Trans Audio Speech Lang Process 20(4):1096–1108
Article Google Scholar
Nam J, Herrera J, Slaney M, Smith J (2012) Learning sparse feature representations for music annotation and retrieval. In: Proceedings of the international society for music information retrieval conference pp 565–570
Panagakis Y, Kotropoulos C (2012) Automatic music tagging by low rank representation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 497–500
Schedl M, Schnitzer D (2013) Hybrid retrieval approaches to geospatial music recommendation. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 793–796
Schedl M, Gómez E, Goto M (2013) Multimedia information retrieval: music and audio. In: Proceedings of the 21st ACM international conference on multimedia, ACM, pp 1117–1118
Skretting K, Engan K (2010) Recursive least squares dictionary learning algorithm. IEEE Trans Signal Process 58(4):2121–2130
Article MathSciNet Google Scholar
Tao L, Tzanetakis G (2003) Factors in automatic musical genre classification of audio signals. In: 2003 I.E. workshop on applications of signal processing to audio and acoustics, IEEE, pp 143–146
Turnbull D, Barrington L, Torres D et al (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2):467–476
Article Google Scholar
Weiqing M, Bingkun B, Changsheng X (2014) Multimodal spatio-temporal theme modeling for landmark analysis. IEEE Multimedia 21(3):20–29
Article Google Scholar
Weiqing M, Bing-Kun B, Shuhuan M, Yaohui Z, Yong R, Shuqiang J (2017) You are what you eat: exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia PP(99):1–15
Google Scholar
Xie B, Bian W, Tao D, et al. (2011) Music tagging with regularized logistic regression. In: Proceedings of international society for music information retrieval conference, pp 711–716

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China under Grant No. 60902065, No. 61401227, and by Beijing Natural Science Foundation (No.4152053).

Author information

Authors and Affiliations

College of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, No.172, 66 Xinmofan Road, Nanjing, Jiangsu Province, 210003, China
Xi Shao
School of Computing, National University of Singapore, Singapore, 119613, Singapore
Xi Shao, Zhiyong Cheng & Mohan S. Kankanhalli

Authors

Xi Shao
View author publications
You can also search for this author inPubMed Google Scholar
Zhiyong Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Mohan S. Kankanhalli
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xi Shao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, X., Cheng, Z. & Kankanhalli, M.S. Music auto-tagging based on the unified latent semantic modeling. Multimed Tools Appl 78, 161–176 (2019). https://doi.org/10.1007/s11042-018-5632-2

Download citation

Received: 15 July 2017
Revised: 07 December 2017
Accepted: 08 January 2018
Published: 20 January 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s11042-018-5632-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Music auto-tagging based on the unified latent semantic modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective music searching approach based on tag combination by exploiting prototypical acoustic content

Music Recommendation: Audio Neighbourhoods to Discover Music in the Long Tail

WikiMuTe: A Web-Sourced Dataset of Semantic Descriptions for Music Audio

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now