A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation

Vatolkin, Igor; Koch, Marcel; Müller, Meinard

doi:10.1007/978-3-030-72914-1_22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12693))

Included in the following conference series:

International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar)

1714 Accesses
1 Citations

Abstract

The goal of automatic music segmentation is to calculate boundaries between musical parts or sections that are perceived as semantic entities. Such sections are often characterized by specific musical properties such as instrumentation, dynamics, tempo, or rhythm. Recent data-driven approaches often phrase music segmentation as a binary classification problem, where musical cues for identifying boundaries are learned implicitly. Complementary to such methods, we present in this paper an approach for identifying relevant audio features that explain the presence of musical boundaries. In particular, we describe a multi-objective evolutionary feature selection strategy, which simultaneously optimizes two objectives. In a first setting, we reduce the number of features while maximizing an F-measure. In a second setting, we jointly maximize precision and recall values. Furthermore, we present extensive experiments based on six different feature sets covering different musical aspects. We show that feature selection allows for reducing the overall dimensionality while increasing the segmentation quality compared to full feature sets, with timbre-related features performing best.

This work was funded by the German Research Foundation (DFG), project 336599081 “Evolutionary optimisation for interpretable music segmentation and music categorisation based on discretised semantic metafeatures”. The experiments were carried out on the Linux HPC cluster at TU Dortmund (LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German Research Foundation (DFG) as project 271512359. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut für Integrierte Schaltungen IIS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The terminology within the scope of this paper is as follows: feature selection keeps individual feature dimensions (e.g., the 2nd MFCC) from feature vectors (e.g., a 13-dimensional MFCC vector) which exclusively belong to feature groups like timbre. A feature set selected for music segmentation is then constructed with various dimensions of various features which however belong to the same group in the current setup—the combination of features from different groups remains a promising future work.

References

Burred, J.J., Lerch, A.: A hierarchical approach to automatic musical genre classification. In: Proceedings of the 6th International Conference on Digital Audio Effects (DAFx), pp. 8–11 (2003)
Google Scholar
Coello, C.A.C., Lamont, G.B., Veldhuizen, D.A.V.: Evolutionary Algorithms for Solving Multi-Objective Problems. Springer, New York (2007). https://doi.org/10.1007/978-0-387-36797-2
Book MATH Google Scholar
Emmerich, M., Beume, N., Naujoks, B.: An EMO algorithm using the hypervolume measure as selection criterion. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 62–76. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31880-4_5
Chapter MATH Google Scholar
Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the 7th ACM International Conference on Multimedia, pp. 77–80 (1999)
Google Scholar
Fujinaga, I.: Machine recognition of timbre using steady-state tone of acoustic musical instruments. In: Proceedings of the International Computer Music Conference (ICMC), pp. 207–210 (1998)
Google Scholar
Grill, T., Schlüter, J.: Music boundary detection using neural networks on combined features and two-level annotations. In: Müller, M., Wiering, F. (eds.) Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pp. 531–537 (2015)
Google Scholar
Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.): Feature Extraction. In: Foundations and Applications, Studies in Fuzziness and Soft Computing, vol. 207. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8
Jensen, K.: Timbre models of musical sounds - from the model of one sound to the model of one instrument. Ph.D. Thesis, University of Copenhagen, Denmark (1999)
Google Scholar
Jensen, K.: Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP J. Adv. Sig. Process. (2007). https://doi.org/10.1155/2007/73205
Klapuri, A., Eronen, A.J., Astola, J.: Analysis of the meter of acoustic musical signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 342–355 (2006)
Article Google Scholar
Lartillot, O., Toiviainen, P.: MIR in MATLAB (II): a toolbox for musical feature extraction from audio. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 127–130. Austrian Computer Society (2007)
Google Scholar
Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 135–140 (2010)
Google Scholar
McEnnis, D., McKay, C., Fujinaga, I.: jAudio: additions and improvements. In: Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pp. 385–386 (2006)
Google Scholar
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings the Python Science Conference, pp. 18–25 (2015)
Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 935–940. ACM (2006)
Google Scholar
Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-21945-5
Book Google Scholar
Müller, M., Ewert, S.: Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Klapuri, A., Leider, C. (eds.) Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), pp. 215–220. University of Miami (2011)
Google Scholar
Müller, M., Zalkow, F.: FMP notebooks: educational material for teaching and learning fundamentals of music processing. In: Proceedings of the 20th International Conference on Music Information Retrieval (ISMIR). Delft, The Netherlands, November 2019
Google Scholar
Parry, R.M., Essa, I.A.: Feature weighting for segmentation. In: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR) (2004)
Google Scholar
Saari, P., Eerola, T., Lartillot, O.: Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans. Audio Speech Lang. Process. 19(6), 1802–1812 (2011)
Article Google Scholar
Smith, J.B.L., Burgoyne, J.A., Fujinaga, I., Roure, D.D., Downie, J.S.: Design and creation of a large-scale database of structural annotations. In: Klapuri, A., Leider, C. (eds.) Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pp. 555–560. University of Miami (2011)
Google Scholar
Smith, J.B.L., Chew, E.: Using quadratic programming to estimate feature relevance in structural analyses of music. In: Proceedings of ACM Multimedia Conference, pp. 113–122. ACM (2013)
Google Scholar
Tian, M.: A cross-cultural analysis of music structure. Ph.D. Thesis, Queen Mary University of London, UK (2017)
Google Scholar
Vatolkin, I.: Generalisation performance of western instrument recognition models in polyphonic mixtures with ethnic samples. In: Correia, J., Ciesielski, V., Liapis, A. (eds.) EvoMUSART 2017. LNCS, vol. 10198, pp. 304–320. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55750-2_21
Chapter Google Scholar
Vatolkin, I., Preuß, M., Rudolph, G.: Multi-objective feature selection in music genre and style recognition tasks. In: Krasnogor, N., Lanzi, P.L. (eds.) Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO), pp. 411–418. ACM Press (2011)
Google Scholar
Vatolkin, I., Theimer, W., Botteck, M.: AMUSE (Advanced MUSic Explorer) - a multitool framework for music data analysis. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society on Music Information Retrieval Conference (ISMIR), pp. 33–38 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund, Germany
Igor Vatolkin & Marcel Koch
International Audio Laboratories Erlangen, Erlangen, Germany
Meinard Müller

Authors

Igor Vatolkin
View author publications
You can also search for this author in PubMed Google Scholar
Marcel Koch
View author publications
You can also search for this author in PubMed Google Scholar
Meinard Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Vatolkin .

Editor information

Editors and Affiliations

University of A Coruña, A Coruña, Spain
Juan Romero
University of Coimbra, Coimbra, Portugal
Tiago Martins
University of A Coruña, A Coruña, Spain
Nereida Rodríguez-Fernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vatolkin, I., Koch, M., Müller, M. (2021). A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021. Lecture Notes in Computer Science(), vol 12693. Springer, Cham. https://doi.org/10.1007/978-3-030-72914-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-72914-1_22
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72913-4
Online ISBN: 978-3-030-72914-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics