Abstract
Music tends to have a distinct structure consisting of repetition and variation of components such as verse and chorus. Understanding such a music structure and its pattern has become increasingly important for music information retrieval (MIR). Thus far, many different methods for music segmentation and structure analysis have been proposed; however, each method has its advantages and disadvantages. By considering the significant variations in timbre, articulation and tempo of music, this is still a challenging task. In this paper, we propose a novel method for music segmentation and its structure analysis. For this, we first extract the timbre feature from the acoustic music signal and construct a self-similarity matrix that shows the similarities among the features within the music clip. Further, we determine the candidate boundaries for music segmentation by tracking the standard deviation in the matrix. Furthermore, we perform two-stage categorization: (i) categorization of the segments in a music clip on the basis of the timbre feature and (ii) categorization of segments in the same category on the basis of the successive chromagram features. In this way, each music clip is represented by a sequence of states where each state represents a certain category defined by two-stage categorization. We show the performance of our proposed method through experiments.
Similar content being viewed by others
References
AllMusic. http://www.allmusic.com/. Accessed 24 October 2013.
Cooper M, Foote J (2002) Automatic music summarization via similarity analysis. Proceedings of the international conference on musical information retrieval (ISMIR), pp 81–85
Cooper M, Foote J (2003) Summarizing popular music via structural similarity analysis. IEEE workshop on applications of signal processing to audio and acoustics, pp 127–130
Foote J (1999) Visualizing music and audio using self-similarity. Proceedings of ACM international conference on multimedia (ACM MM). pp 77–80
Foote J (2000) Automatic audio segmentation using a measure of audio novelty. Proceedings of IEEE international conference on multimedia and expo (ICME), vol. 1. pp 452–455
Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. Proceedings of international computer music conference (ICMC), pp 464–467
Jun S, Hwang E (2013) Music segmentation and summarization based on self-similarity matrix. Proceedings of the 7th international conference on ubiquitous information management and communication. 82:1–4
Jun S, Rho S, Hwang E (2010) Music retrieval and recommendation scheme based on varying mood sequences. Int J Semant Web Inf Syst 6(2):1–16. doi:10.4018/jswis.2010040101
Kaiser F, Sikora T (2010) Music structure discovery in popular music using non-negative matrix factorization. Proceedings of international conference on music information retrieval (ISMIR), pp 429–434
Klapuri A (1999) Sound onset detection by applying psychoacoustic knowledge. Proceedings of IEEE international conference on acoustics, speech, and signal, vol.6. pp 3089–3092
Logan B (2000) Mel frequency cepstral coefficients for music modeling. Proceedings of international conference on music information retrieval (ISMIR)
Lu L, Wang M, Zhang H-J (2004) Repeating pattern discovery and structure analysis from acoustic music data. ACM SIGMM international workshop on multimedia information retrieval, pp 275–282
Maddage NC, Xu C, Kankanhalli MS, Shao X (2004) Content-based music structure analysis with applications to music semantics understanding. Proceedings of ACM international conference on multimedia (ACM MM), pp 112–119
Paulus J, Klapuri A (2009) Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans Audio Speech Lang Process 17:1159–1170. doi:10.1109/TASL.2009.2020533
Peeters G (2004) Deriving musical structures from signal analysis for music audio summary generation: “Sequence” and “State” approach. Computer music modeling and retrieval. Springer Berlin, Heidelberg, pp 169–185
Peeters G (2007) Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. Proceedings of the international conference on musical information retrieval (ISMIR), pp 35–40
Rabiner L, Juang B-H (1993) Fundamentals of speech recognition. Prentice Hall
Serrà J, Müller M, Grosche P, Arcos JL (2012) Unsupervised detection of music boundaries by time series structure features. Proceedings of twenty-Sixth AAAI Conference on Artificial Intelligence, pp 1613–1619
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10:293–302. doi:10.1109/TSA.2002.800560
Wang M, Lu L, Zhang H-J (2004) Repeating pattern discovery from acoustic musical signals. Proceedings of IEEE international conference on multimedia and expo (ICME), vol. 3, pp 2019–2022
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF-2013R1A1A2012627) and the MSIP(Ministry of Science, ICT&Future Planning), Korea, under the C-ITRC(Convergence Information Technology Research Center) support program (NIPA-2013-H0301-13-3006) supervised by the NIPA(National IT Industry Promotion Agency).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jun, S., Rho, S. & Hwang, E. Music structure analysis using self-similarity matrix and two-stage categorization. Multimed Tools Appl 74, 287–302 (2015). https://doi.org/10.1007/s11042-013-1761-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1761-9