Skip to main content

Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music

  • Conference paper
  • First Online:
Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2024)

Abstract

Current state-of-the-art methods for instrument and pitch detection in polyphonic music often require large datasets and long training times; resources which are sparse in the field of music information retrieval, presenting a need for unsupervised alternative methods that do not require such prerequisites. We present a modification to an evolutionary algorithm for polyphonic music approximation through synthesis that uses spectral information to initialise populations with probable pitches. This algorithm can perform joint instrument and pitch detection on polyphonic music pieces without any of the aforementioned constraints. Sets of tuples of (instrument, style, pitch) are graded with a COSH distance fitness function and finally determine the algorithm’s instrument and pitch labels for a given part of a music piece. Further investigation into this fitness function indicates that it tends to create false positives which may conceal the true potential of our modified approach. Regardless of that, our modification still shows significantly faster convergence speed and slightly improved pitch and instrument detection errors over the baseline algorithm on both single onset and full piece experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Notes

  1. 1.

    https://github.com/jd-rub/EvoMUSART23-repro-code.

References

  1. Bansal, M., Sircar, P.: Parametric representation of voiced speech phoneme using multicomponent am signal model. In: Proceedings of the 2018 IEEE/ACIS 17th International Conference on Computer and Information Science, ICIS, pp. 128–133 (2018)

    Google Scholar 

  2. Benetos, E., Dixon, S., Duan, Z., Ewert, S.: Automatic music transcription: an overview. IEEE Sig. Process. Mag. 36(1), 20–30 (2019)

    Article  Google Scholar 

  3. Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR, pp. 155–160 (2014)

    Google Scholar 

  4. Brown, J.C., Houix, O., McAdams, S.: Feature dependence in the automatic identification of musical woodwind instruments. J. Acoust. Soc. Am. 109(3), 1064–1072 (2001)

    Article  Google Scholar 

  5. Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Am. 92(5), 2698–2701 (1992)

    Article  Google Scholar 

  6. Eerola, T., Ferrer, R.: Instrument library (MUMS) revised. Music. Percept. 25(3), 253–255 (2008)

    Article  Google Scholar 

  7. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series, 2nd edn. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8

    Book  Google Scholar 

  8. Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20, 55:1–55:21 (2019)

    Google Scholar 

  9. Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: Proceedings of the 13th International Conference on Digital Audio Effects, DAFx, pp. 1–4 (2010)

    Google Scholar 

  10. Fricke, L., Vatolkin, I., Ostermann, F.: Application of neural architecture search to instrument recognition in polyphonic audio. In: Johnson, C., Rodríguez-Fernández, N., Rebelo, S.M. (eds.) EvoMUSART 2023. LNCS, vol. 13988, pp. 117–131. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-29956-8_8

    Chapter  Google Scholar 

  11. George, E.B., Smith, M.J.: Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones. J. Audio Eng. Soc. 40(6), 497–516 (1992)

    Google Scholar 

  12. Ginsel, P.: Abstandsmaße zur evolutionären Klangapproximation auf Audiodaten. Master’s thesis, TU Dortmund University, Department of Computer Science (2021)

    Google Scholar 

  13. Gong, Y., Chung, Y.A., Glass, J.: AST: audio spectrogram transformer. In: Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech, pp. 571–575 (2021). https://doi.org/10.21437/Interspeech.2021-698

  14. Gray, A., Markel, J.: Distance measures for speech processing. IEEE Trans. Acoust. Speech Sig. Process. 24(5), 380–391 (1976)

    Article  Google Scholar 

  15. Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)

    Article  Google Scholar 

  16. Heittola, T., Klapuri, A., Virtanen, T.: Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR, pp. 327–332 (2009)

    Google Scholar 

  17. Humphrey, E., Durand, S., McFee, B.: OpenMIC-2018: an open data-set for multiple instrument recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR, pp. 438–444 (2018)

    Google Scholar 

  18. Itakura, F.: Analysis synthesis telephony based on the maximum likelihood method. In: Reports of the 6th International Congress on Acoustics, pp. C17–20 (1968)

    Google Scholar 

  19. Klapuri, A.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech Audio Process. 11(6), 804–816 (2003)

    Article  Google Scholar 

  20. Koutini, K., Schlüter, J., Eghbal-zadeh, H., Widmer, G.: Efficient training of audio transformers with patchout. In: Proceedings of the 23rd Annual Conference of the International Speech Communication Association, Interspeech, pp. 2753–2757 (2022). https://doi.org/10.21437/Interspeech.2022-227

  21. Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)

    Article  Google Scholar 

  22. Li, X., Wang, K., Soraghan, J., Ren, J.: Fusion of Hilbert-Huang transform and deep convolutional neural network for predominant musical instruments recognition. In: Romero, J., Ekárt, A., Martins, T., Correia, J. (eds.) EvoMUSART 2020. LNCS, vol. 12103, pp. 80–89. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43859-3_6

    Chapter  Google Scholar 

  23. Livshin, A., Rodet, X.: The significance of the non-harmonic “noise” versus the harmonic series for musical instrument recognition. In: Proceedings of the 7th International Conference on Music Information Retrieval, ISMIR, pp. 95–100 (2006)

    Google Scholar 

  24. Manilow, E., Wichern, G., Seetharaman, P., Le Roux, J.: Cutting music source separation some Slakh: a dataset to study the impact of training data quality and quantity. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA (2019)

    Google Scholar 

  25. Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. Cambridge research laboratory technical report series CRL 4, 143 (1999)

    Google Scholar 

  26. Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 135–140 (2010)

    Google Scholar 

  27. McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)

    Google Scholar 

  28. Müller, M., Ewert, S.: Towards timbre-invariant audio features for harmony-based music. IEEE Trans. Audio Speech Lang. Process. 18(3), 649–662 (2010)

    Article  Google Scholar 

  29. Instruments, N.: Komplete 11 Ultimate. Native Instruments North America Inc., Los Angeles (2016)

    Google Scholar 

  30. Ostermann, F., Vatolkin, I., Ebeling, M.: AAM: a dataset of artificial audio multitracks for diverse music information retrieval tasks. EURASIP J. Audio Speech Music Process. 2023(1), 13 (2023). https://doi.org/10.1186/s13636-023-00278-7

    Article  Google Scholar 

  31. Schmid, F., Koutini, K., Widmer, G.: Efficient large-scale audio tagging via transformer-to-CNN knowledge distillation. In: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096110

  32. Singh, S., Wang, R., Qiu, Y.: DeepF0: end-to-end fundamental frequency estimation for music and speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 61–65 (2021)

    Google Scholar 

  33. Vatolkin, I.: Evolutionary approximation of instrumental texture in polyphonic audio recordings. In: Proceedings of the 2020 IEEE Congress on Evolutionary Computation, CEC, pp. 1–8 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justin Dettmer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dettmer, J., Vatolkin, I., Glasmachers, T. (2024). Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music. In: Johnson, C., Rebelo, S.M., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2024. Lecture Notes in Computer Science, vol 14633. Springer, Cham. https://doi.org/10.1007/978-3-031-56992-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56992-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56991-3

  • Online ISBN: 978-3-031-56992-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics