Abstract
In this paper, we investigate the derivation of musical structures directly from signal analysis with the aim of generating visual and audio summaries. From the audio signal, we first derive features – static features (MFCC, chromagram) or proposed dynamic features. Two approaches are then studied in order to derive automatically the structure of a piece of music. The sequence approach considers the audio signal as a repetition of sequences of events. Sequences are derived from the similarity matrix of the features by a proposed algorithm based on a 2D structuring filter and pattern matching. The state approach considers the audio signal as a succession of states. Since human segmentation and grouping performs better upon subsequent hearings, this natural approach is followed here using a proposed multi-pass approach combining time segmentation and unsupervised learning methods. Both sequence and state representations are used for the creation of an audio summary using various techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aucouturier, J.-J., Sandler, M.: Segmentation of musical signals using hidden markov models. In: AES 110th Convention, Amsterdam, The Netherlands (2001)
Aucouturier, J.-J., Sandler, M.: Finding repeating patterns in acoustic musical signals: applications for audio thumbnailing. In: AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, Espoo, Finland (2002)
Bartsch, M., Wakefield, G.: To catch a chorus: Using chroma-based representations for audio thumbnailing. In: WASPAA, New Paltz, New York, USA (2001)
Beatles, T.: Love me do (one, the best of album). Apple, Capitol Records (2001)
Bjork. It’s oh so quiet (post album). Mother records (1995)
Cambouropoulos, E., Crochemore, M., Iliopoulos, C., Mouchard, L., Pinzon, Y.: Algorithms for computing approximate repetitions in musical sequences. In: Raman, R., Simpson, J. (eds.) 10th Australasian Workshop On Combinatorial Algorithms, Perth, WA, Australia, pp. 129–144 (1999)
Cooper, M., Foote, J.: Automatic music summarization via similarity analysis. In: ISMIR, Paris, France (2002)
Crawford, T., Iliopoulos, C., Raman, R.: String matching techniques for musical similarity and melodic recognition. In: Computing in Musicology, vol. 11, pp. 73–100. MIT Press, Cambridge (1998)
Dannenberg, R.: Pattern discovery techniques for music audio. In: ISMIR, Paris (2002)
Deliege, I.: A perceptual approach to contemporary musical forms. In: Osborne, N. (ed.) Music and the cognitive sciences, vol. 4, pp. 213–230. Harwood Academic publishers (1990)
Eckman, J., Kamphorts, S., Ruelle, R.: Recurrence plots of dynamical systems. Europhys. Lett. 4, 973–977 (1987)
Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: ICME (IEEE Int. Conf. Multimedia and Expo), New York City, NY, USA, p. 452 (1999)
Foote, J.: Visualizing music and audio using self-similarity. In: ACM Multimedia, Orlando, Florida, USA, pp. 77–84 (1999)
Foote, J.: Arthur: Retrieving orchestral music by long-term structure. In: ISMIR, Pymouth, Massachusetts, USA (2000)
Hunt, M., Lennig, M., Mermelstein, P.: Experiments in syllable-based recognition of continuous speech. In: ICASSP, Denver, Colorado, USA, pp. 880–883 (1980)
Laburthe, A.: Resume sonore. Master thesis, Universite Joseph Fourier, Grenoble, France (2002)
Lemstrom, K., Tarhio, J.: Searching monophonic patterns within polyphonic sources. In: RIAO, pp. 1261–1278. College of France, Paris (2000)
Logan, B., Chu, S.: Music summarization using key phrases. In: ICASSP, Istanbul, Turkey (2000)
Moby. Natural blues (play album). Labels (2001)
MPEG-7. Information technology - multimedia content description interface - part 5: Multimedia description scheme (2002)
Nirvana. Smells like teen spirit (nevermind album). Polygram (1991)
Orio, N., Schwarz, D.: Alignment of monophonic and polyphonic music to a score. In: ICMC, La Habana, Cuba (2001)
Peeters, G., Laburthe, A., Rodet, X.: Toward automatic music audio summary generation from signal analysis. In: ISMIR, Paris, France (2002)
Rabiner, L.: A tutorial on hidden markov model and selected applications in speech. Proccedings of the IEEE 77(2), 257–285 (1989)
Rossignol, S.: Segmentation et indexation des signaux sonores musicaux. Phd thesis, Universite Paris VI, Paris, France (2000)
Scheirer, E.: Tempo and beat analysis of acoustic musical signals. JASA 103(1), 588–601 (1998)
Souren, K.: Extraction of structure of a musical piece starting from audio descriptors. Technical report, Ircam (2003)
Tzanetakis, G., Cook, P.: Multifeature audio segmentation for browsing and annotation. In: WASPAA, New Paltz, New York, USA (1999)
VanSteelant, D., DeBaets, B., DeMeyer, H., Leman, M., Martens, S.-P., Clarisse, L., Lesaffre, M.: Discovering structure and repetition in musical audio. In: Eurofuse, Varanna, Italy (2002)
Vinet, H., Herrera, P., Pachet, F.: The cuidado project. In: ISMIR, Paris, France (2002)
Zhang, H., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. ACM Multimedia System 1(1), 10–28 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peeters, G. (2004). Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: “Sequence” and “State” Approach. In: Wiil, U.K. (eds) Computer Music Modeling and Retrieval. CMMR 2003. Lecture Notes in Computer Science, vol 2771. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39900-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-39900-1_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20922-5
Online ISBN: 978-3-540-39900-1
eBook Packages: Springer Book Archive