Skip to main content

A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation

  • Conference paper
  • First Online:
Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2021)

Abstract

The goal of automatic music segmentation is to calculate boundaries between musical parts or sections that are perceived as semantic entities. Such sections are often characterized by specific musical properties such as instrumentation, dynamics, tempo, or rhythm. Recent data-driven approaches often phrase music segmentation as a binary classification problem, where musical cues for identifying boundaries are learned implicitly. Complementary to such methods, we present in this paper an approach for identifying relevant audio features that explain the presence of musical boundaries. In particular, we describe a multi-objective evolutionary feature selection strategy, which simultaneously optimizes two objectives. In a first setting, we reduce the number of features while maximizing an F-measure. In a second setting, we jointly maximize precision and recall values. Furthermore, we present extensive experiments based on six different feature sets covering different musical aspects. We show that feature selection allows for reducing the overall dimensionality while increasing the segmentation quality compared to full feature sets, with timbre-related features performing best.

This work was funded by the German Research Foundation (DFG), project 336599081 “Evolutionary optimisation for interpretable music segmentation and music categorisation based on discretised semantic metafeatures”. The experiments were carried out on the Linux HPC cluster at TU Dortmund (LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German Research Foundation (DFG) as project 271512359. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut für Integrierte Schaltungen IIS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The terminology within the scope of this paper is as follows: feature selection keeps individual feature dimensions (e.g., the 2nd MFCC) from feature vectors (e.g., a 13-dimensional MFCC vector) which exclusively belong to feature groups like timbre. A feature set selected for music segmentation is then constructed with various dimensions of various features which however belong to the same group in the current setup—the combination of features from different groups remains a promising future work.

References

  1. Burred, J.J., Lerch, A.: A hierarchical approach to automatic musical genre classification. In: Proceedings of the 6th International Conference on Digital Audio Effects (DAFx), pp. 8–11 (2003)

    Google Scholar 

  2. Coello, C.A.C., Lamont, G.B., Veldhuizen, D.A.V.: Evolutionary Algorithms for Solving Multi-Objective Problems. Springer, New York (2007). https://doi.org/10.1007/978-0-387-36797-2

    Book  MATH  Google Scholar 

  3. Emmerich, M., Beume, N., Naujoks, B.: An EMO algorithm using the hypervolume measure as selection criterion. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 62–76. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31880-4_5

    Chapter  MATH  Google Scholar 

  4. Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the 7th ACM International Conference on Multimedia, pp. 77–80 (1999)

    Google Scholar 

  5. Fujinaga, I.: Machine recognition of timbre using steady-state tone of acoustic musical instruments. In: Proceedings of the International Computer Music Conference (ICMC), pp. 207–210 (1998)

    Google Scholar 

  6. Grill, T., Schlüter, J.: Music boundary detection using neural networks on combined features and two-level annotations. In: Müller, M., Wiering, F. (eds.) Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pp. 531–537 (2015)

    Google Scholar 

  7. Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.): Feature Extraction. In: Foundations and Applications, Studies in Fuzziness and Soft Computing, vol. 207. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8

  8. Jensen, K.: Timbre models of musical sounds - from the model of one sound to the model of one instrument. Ph.D. Thesis, University of Copenhagen, Denmark (1999)

    Google Scholar 

  9. Jensen, K.: Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP J. Adv. Sig. Process. (2007). https://doi.org/10.1155/2007/73205

  10. Klapuri, A., Eronen, A.J., Astola, J.: Analysis of the meter of acoustic musical signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 342–355 (2006)

    Article  Google Scholar 

  11. Lartillot, O., Toiviainen, P.: MIR in MATLAB (II): a toolbox for musical feature extraction from audio. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 127–130. Austrian Computer Society (2007)

    Google Scholar 

  12. Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 135–140 (2010)

    Google Scholar 

  13. McEnnis, D., McKay, C., Fujinaga, I.: jAudio: additions and improvements. In: Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pp. 385–386 (2006)

    Google Scholar 

  14. McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings the Python Science Conference, pp. 18–25 (2015)

    Google Scholar 

  15. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 935–940. ACM (2006)

    Google Scholar 

  16. Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-21945-5

    Book  Google Scholar 

  17. Müller, M., Ewert, S.: Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Klapuri, A., Leider, C. (eds.) Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), pp. 215–220. University of Miami (2011)

    Google Scholar 

  18. Müller, M., Zalkow, F.: FMP notebooks: educational material for teaching and learning fundamentals of music processing. In: Proceedings of the 20th International Conference on Music Information Retrieval (ISMIR). Delft, The Netherlands, November 2019

    Google Scholar 

  19. Parry, R.M., Essa, I.A.: Feature weighting for segmentation. In: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR) (2004)

    Google Scholar 

  20. Saari, P., Eerola, T., Lartillot, O.: Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans. Audio Speech Lang. Process. 19(6), 1802–1812 (2011)

    Article  Google Scholar 

  21. Smith, J.B.L., Burgoyne, J.A., Fujinaga, I., Roure, D.D., Downie, J.S.: Design and creation of a large-scale database of structural annotations. In: Klapuri, A., Leider, C. (eds.) Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pp. 555–560. University of Miami (2011)

    Google Scholar 

  22. Smith, J.B.L., Chew, E.: Using quadratic programming to estimate feature relevance in structural analyses of music. In: Proceedings of ACM Multimedia Conference, pp. 113–122. ACM (2013)

    Google Scholar 

  23. Tian, M.: A cross-cultural analysis of music structure. Ph.D. Thesis, Queen Mary University of London, UK (2017)

    Google Scholar 

  24. Vatolkin, I.: Generalisation performance of western instrument recognition models in polyphonic mixtures with ethnic samples. In: Correia, J., Ciesielski, V., Liapis, A. (eds.) EvoMUSART 2017. LNCS, vol. 10198, pp. 304–320. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55750-2_21

    Chapter  Google Scholar 

  25. Vatolkin, I., Preuß, M., Rudolph, G.: Multi-objective feature selection in music genre and style recognition tasks. In: Krasnogor, N., Lanzi, P.L. (eds.) Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO), pp. 411–418. ACM Press (2011)

    Google Scholar 

  26. Vatolkin, I., Theimer, W., Botteck, M.: AMUSE (Advanced MUSic Explorer) - a multitool framework for music data analysis. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society on Music Information Retrieval Conference (ISMIR), pp. 33–38 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Vatolkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vatolkin, I., Koch, M., Müller, M. (2021). A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021. Lecture Notes in Computer Science(), vol 12693. Springer, Cham. https://doi.org/10.1007/978-3-030-72914-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72914-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72913-4

  • Online ISBN: 978-3-030-72914-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics