ABSTRACT
To identify repeated patterns and contrasting sections in music, it is common to use self-similarity matrices (SSMs) to visualize and estimate structure. We introduce a novel application for SSMs derived from audio recordings: using them to learn about the potential reasoning behind a listener's annotation. We use SSMs generated by musically-motivated audio features at various timescales to represent contributions to a structural annotation. Since a listener's attention can shift among musical features (e.g., rhythm, timbre, and harmony) throughout a piece, we further break down the SSMs into section-wise components and use quadratic programming (QP) to minimize the distance between a linear sum of these components and the annotated description. We posit that the optimal section-wise weights on the feature components may indicate the features to which a listener attended when annotating a piece, and thus may help us to understand why two listeners disagreed about a piece's structure. We discuss some examples that substantiate the claim that feature relevance varies throughout a piece, using our method to investigate differences between listeners' interpretations, and lastly propose some variations on our method.
- Bartsch, M. and Wakefield, G. 200 To catch a chorus: using chroma-based representations for audio thumbnailing. IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, USA). 15--18.Google Scholar
- Bruderer, M., McKinney, M., and Kohlrausch, A. 2009. The perception of structural boundaries in melody lines of Western popular music. Musicae Scientae. 13, 2, 273--313.Google ScholarCross Ref
- Clarke, E. F. and Krumhansl, C. L. 1990. Perceiving musical time. Music Perception. 7, 3, 213--251.Google ScholarCross Ref
- Eckmann, J. P., Kamphorst, S. O., and Ruelle, D. 1987. Recurrence plots of dynamical systems. Europhysics Letters, 5, 9, 973--977.Google ScholarCross Ref
- Ehmann, A. F., Bay, M., Downie, J. S., Fujinaga, I., and De Roure, D. 2011. Exploiting music structures for digital libraries. In Proceeding of the International ACM/IEEE Joint Conference on Digital Libraries (Ottawa, Canada). 479--480. Google ScholarDigital Library
- Eronen, A. 2007. Chorus detection with combined use of MFCC and chroma features and image processing filters. In Proceedings of the International Conference on Digital Audio Effects (Bordeaux, France). 229--236.Google Scholar
- Foote, J. 1999. Visualizing music and audio using self-similarity. In Proceedings of the ACM International Conference on Multimedia (New York, NY, USA). 77--80. Google ScholarDigital Library
- Foote, J. 2000. Automatic Audio Segmentation using a Measure of Audio Novelty. In Proceedings of the IEEE International Conference on Multimedia and Expo. 452--455.Google ScholarCross Ref
- Foote, J. and Cooper, M. 2003. Media segmentation using self-similarity decomposition. In Proceedings of the SPIE: Storage and Retrieval for Media Databases (Santa Clara, CA, USA). 167--175.Google Scholar
- Frankland, B. and Cohen, A. 2004. Parsing of melody: Quantification and testing of the local grouping rules of Lerdahl and Jackendoff's A Generative Theory of Tonal Music. Music Perception. 21, 4, 499--543.Google ScholarCross Ref
- Goto, M. 2006. A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Transactions on Audio, Speech and Language Processing. 14, 5, 1783--1794. Google ScholarDigital Library
- Grosche, P., Serrà, J., Müller, M., and Arcos, J. L. 2012. Structure-based audio fingerprinting for music retrieval. In Proceedings of the International Conference on Music Information Retrieval (Porto, Portugal). 55--60.Google Scholar
- Hargreaves, S., Klapuri, A., and Sandler, M. 2012. Structural segmentation of multitrack audio. IEEE Transactions on Audio, Speech, and Language Processing. 20, 10, 2637--2647.Google ScholarDigital Library
- Jehan, T. 2005. Hierarchical multi-class self similarities. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, USA). 311--314.Google ScholarCross Ref
- Kaiser, F. and Peeters, G. 2012. Adaptive temporal modeling of audio features in the context of music structure segmentation. In International Workshop on Adaptive Multimedia Retrieval (Copenhagen, Denmark).Google Scholar
- Kaiser, F. and Sikora, T. 2010. Music structure discovery in popular music using non-negative matrix factorization. In Proceedings of the International Society for Music Information Retrieval Conference (Utrecht, The Netherlands). 429--434.Google Scholar
- Landone, C. Gasser, M., Cannam, C., Harte, C., Davies, M., Noland, K., Wilmering, T., Xue, W., and Zhou, R. 2011. QM Vamp Plugins. Available: http://isophonics.net/QMVampPlugins, accessed 1 October 2012.Google Scholar
- Marolt, M. 2006. A mid-level melody-based representation for calculating audio similarity. In Proceedings of the International Conference on Music Information Retrieval (Victoria, Canada). 280--285.Google Scholar
- Mauch, M., Noland, K., and Dixon, S. 2009. Using musical structure to enhance automatic chord transcription. In Proceedings of the International Society for Music Information Retrieval Conference (Kobe, Japan). 231--236.Google Scholar
- Müller, M. & Appelt, D. 2008. Path-constrained partial music synchronization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Las Vegas, NV, USA). 65--68.Google ScholarCross Ref
- Ong, B. 2007. Structural Analysis and Segmentation of Music Signals. Ph.D. dissertation, University Pompeu Fabra.Google Scholar
- Pampalk, E. 2004. A Matlab toolbox to compute similarity from audio. In Proceedings of the International Conference on Music Information Retrieval (Barcelona, Spain). 254--257.Google Scholar
- Pampalk, E., Dixon, S., and Widmer, G. 2004. Exploring music collections by browsing different views. Computer Music Journal, 28, 2, 49--62. Google ScholarDigital Library
- Pampalk, E., Rauber, A., and Merkl, D. 2002. Content-based organization and visualization of music archives. In Proceedings of the ACM International Conference on Multimedia (Juan les Pins, France). 570--579. Google ScholarDigital Library
- Parry, R., and Essa, I. 2004. Feature weighting for segmentation. In Proceedings of the International Conference for Music Information Retrieval (Barcelona, Spain).Google Scholar
- Paulus, J. & Klapuri, A. 2006. Music structure analysis by finding repeated parts. In Proceedings of the ACM Workshop on Audio and Music Computing Multimedia (New York, NY, USA). 59--68. Google ScholarDigital Library
- Paulus, J., Müller, M., and Klapuri, A. 2010. Audio-based music structure analysis. In Proceedings of the International Society for Music Information Retrieval Conference (Utrecht, The Netherlands). 625--636.Google Scholar
- Peeters, G. 2004. Deriving musical structures from signal analysis for music audio summary generation: "Sequence" and "State" approach. In Computer Music Modeling and Retrieval 2771. Springer Berlin / Heidelberg, 169--185.Google Scholar
- Peeters, G. 2007. Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In Proceedings of the International Conference on Music Information Retrieval (Vienna, Austria). 35--40.Google Scholar
- Shiu, Y., Jeong, H., and Kuo, C.-C. J. 2006. Similarity matrix processing for music structure analysis. In Proceedings of the ACM Workshop on Audio and Music Computing Multimedia (New York, NY, USA). 69--76. Google ScholarDigital Library
- Smith, J. B. L., Burgoyne, J. A., Fujinaga, I., De Roure, D., and Downie, J. S. 2011. Design and creation of a large-scale database of structural annotations. In Proceedings of the International Society for Music Information Retrieval Conference (Miami, FL, USA). 555--560.Google Scholar
Index Terms
- Using quadratic programming to estimate feature relevance in structural analyses of music
Recommendations
Multimodal content-based structure analysis of karaoke music
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaThis paper presents a novel approach for content-based analysis of karaoke music, which utilizes multimodal contents including synchronized lyrics text from the video channel and original singing audio as well as accompaniment audio in the two audio ...
Cognitive factors in generative music systems
AM '14: Proceedings of the 9th Audio Mostly: A Conference on Interaction With SoundThis research aims to inform the development of generative music algorithms with principles drawn from research into music perception and cognition. Research has provided insights into the ways humans mentally organise musical sound and resulted in ...
Structural Segmentation of Music Based on Repeated Harmonies
ISM '13: Proceedings of the 2013 IEEE International Symposium on MultimediaIn this paper we present a simple, yet powerful method for deriving the structural segmentation of a musical piece based on repetitions in chord sequences, called FORM. Repetition in harmony is a fundamental factor in constituting musical form. However, ...
Comments