Abstract
In this paper, we discuss various features of music objects in two kinds of domain. Among these features, Mel-frequency cepstral coefficients (MFCCs) are further discussed and described by Gaussian mixture model (GMM). Also, the similarity between GMMs are investigated accordingly. Then, we employ the multimedia graph as a cross-modal method to associate MFCCs and genre tags of music objects. By applying link analysis algorithm in the graph, we label appropriate genre tags for target music objects. Also, we perform experiments to show performance, effectiveness, and parameter setting of our approach.
Similar content being viewed by others
References
All Music Guide. Available online: http://www.allmusic.com
Aucouturier J-J, Pachet F (2002) Music similarity measures: what’s the use? In: Proc. of international symposium on music information retrieval (ISMIR)
Aucouturier J-J, Pachet F (2004) Improving timbre similarity: how high is the sky? JNRSAS 1(1)
Aucouturier J-J, Pachet F, Sandler M (2005) The way it sounds: timbre models for analysis and retrieval of music signals. IEEE Trans Multimedia 7(6):1028–1035
Berenzweig A, Logan B, Ellis D, Whitman B (2003) A large-scale evaluation of acoustic and subjective music similarity measures. In: Proc. of international symposium on music information retrieval (ISMIR)
Berenzweig A, Logan B, Ellis D, Whitman B (2004) A large-scale evaluation of acoustic and subjective music-similarity measures. Comput Music J 28(2):63–76
Blum TL, Keislar DF, Wheaton JA, Wold EH (1999) Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information. US Patent 5918223
Breyer L (2002) Markovian page ranking distributions: some theory and simulations. http://www.lbreyer.com/preprints.html
Brin S, Motwani R, Page L, Winograd T (1998) What can you do with a Web in your pocket? In: Proc. of IEEE international conference on computer society technical committee on data engineering
Chen J-Y, Hershey J, Olsen P, Yashchin E (2008) Accelerated Monte Carlo for Kullback–Leibler divergence between gaussian mixture models. In: Proc. of IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4553–4556
Dongge L, Nevenka D, Mingkun L (2006) Multimedia content processing through cross-modal association. In: Proc. of ACM international conference on multimedia, pp 32–37
Ellis DP (2006) Extracting information from music audio. Commun ACM 49(8):32–37
Eronen A, Klapuri A (2000) Musical instrument recognition using cepstral coefficients and temporal features. In: Proc. of IEEE international conference on acoustics, speech and signal processing (ICASSP)
Haveliwala TH (2002) Topic-sensitive pagerank. In: Proc. of international World Wide Web conference (WWW)
Krishnamoorthy P, Kumar S (2010) Hierarchical audio content classification system using an optimal feature selection algorithm. Multimed Tools Appl. doi:10.1007/s11042-010-0546-7
Kullback S (1968) Information theory and statistics. Dover, New York
Langlois T, Marques G (2009) Music classification method based on timbral features. In: Proc. of international symposium on music information retrieval (ISMIR)
Li T, Ogihara M, Li Q (2003) A comparative study on content-based music genre classification. In: Proc. of ACM SIGIR, pp 282–289
Logan B (2000) Mel frequency cepstral coefficients for music modeling. In: Proc. of international symposium on music information retrieval (ISMIR)
Oliveira-Brochado A, Martins FV (2005) Assessing the number of components in mixture models: a review. Universidade do Porto, Faculdade de Economia do Porto. FEP working papers. [Online]. Available: http://econpapers.repec.org/RePEc:por:fepwps:194
Pachet F, Aucouturier J-J, La Burthe A, Zils A, Beurive A (2006) The cuidado music browser: an end-to-end electronic music distribution system. Multimed Tools Appl 30(3):331–349. doi:10.1007/s11042-006-0030-6
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the Web. In: Stanford digital library. Stanford University, Stanford
Pampalk E, Flexer A, Widmer G (2005) Improvements of audio-based music similarity and genre classification. In: Proc. of international symposium on music information retrieval (ISMIR), pp 628–633
Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proc. of ACM international conference on knowledge discovery and data mining (SIGKDD)
Schwarz D, Rodet X (1999) Spectral estimation and fepresentation for sound analysis-synthesis. In: Proc. of international computer music conference (ICMC)
Theodoridis S, Koutroumbas K (2008) Pattern recognition. Academic, New York
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302
USPOP2002 Pop Music Data Set. Available online: http://labrosa.ee.columbia.edu/projects/musicsim/uspop2002.html
Zeng W, Hu R, Ai H (2010) Audio steganalysis of spread spectrum information hiding based on statistical moment and distance metric. Multimed Tools Appl. doi:10.1007/s11042-010-0564-5
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by Fu Jen Catholic University with Project No. 409731044039, and sponsored by the National Science Council under Contract No. NSC-97-2221-E-030-013.
Rights and permissions
About this article
Cite this article
Hsu, JL., Li, YF. A cross-modal method of labeling music tags. Multimed Tools Appl 58, 521–541 (2012). https://doi.org/10.1007/s11042-011-0729-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0729-x