Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music

Dettmer, Justin; Vatolkin, Igor; Glasmachers, Tobias

doi:10.1007/978-3-031-56992-0_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14633))

Included in the following conference series:

International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar)

702 Accesses

Abstract

Current state-of-the-art methods for instrument and pitch detection in polyphonic music often require large datasets and long training times; resources which are sparse in the field of music information retrieval, presenting a need for unsupervised alternative methods that do not require such prerequisites. We present a modification to an evolutionary algorithm for polyphonic music approximation through synthesis that uses spectral information to initialise populations with probable pitches. This algorithm can perform joint instrument and pitch detection on polyphonic music pieces without any of the aforementioned constraints. Sets of tuples of (instrument, style, pitch) are graded with a COSH distance fitness function and finally determine the algorithm’s instrument and pitch labels for a given part of a music piece. Further investigation into this fitness function indicates that it tends to create false positives which may conceal the true potential of our modified approach. Regardless of that, our modification still shows significantly faster convergence speed and slightly improved pitch and instrument detection errors over the baseline algorithm on both single onset and full piece experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Notes

1.
https://github.com/jd-rub/EvoMUSART23-repro-code.

References

Bansal, M., Sircar, P.: Parametric representation of voiced speech phoneme using multicomponent am signal model. In: Proceedings of the 2018 IEEE/ACIS 17th International Conference on Computer and Information Science, ICIS, pp. 128–133 (2018)
Google Scholar
Benetos, E., Dixon, S., Duan, Z., Ewert, S.: Automatic music transcription: an overview. IEEE Sig. Process. Mag. 36(1), 20–30 (2019)
Article Google Scholar
Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR, pp. 155–160 (2014)
Google Scholar
Brown, J.C., Houix, O., McAdams, S.: Feature dependence in the automatic identification of musical woodwind instruments. J. Acoust. Soc. Am. 109(3), 1064–1072 (2001)
Article Google Scholar
Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Am. 92(5), 2698–2701 (1992)
Article Google Scholar
Eerola, T., Ferrer, R.: Instrument library (MUMS) revised. Music. Percept. 25(3), 253–255 (2008)
Article Google Scholar
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series, 2nd edn. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8
Book Google Scholar
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20, 55:1–55:21 (2019)
Google Scholar
Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: Proceedings of the 13th International Conference on Digital Audio Effects, DAFx, pp. 1–4 (2010)
Google Scholar
Fricke, L., Vatolkin, I., Ostermann, F.: Application of neural architecture search to instrument recognition in polyphonic audio. In: Johnson, C., Rodríguez-Fernández, N., Rebelo, S.M. (eds.) EvoMUSART 2023. LNCS, vol. 13988, pp. 117–131. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-29956-8_8
Chapter Google Scholar
George, E.B., Smith, M.J.: Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones. J. Audio Eng. Soc. 40(6), 497–516 (1992)
Google Scholar
Ginsel, P.: Abstandsmaße zur evolutionären Klangapproximation auf Audiodaten. Master’s thesis, TU Dortmund University, Department of Computer Science (2021)
Google Scholar
Gong, Y., Chung, Y.A., Glass, J.: AST: audio spectrogram transformer. In: Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech, pp. 571–575 (2021). https://doi.org/10.21437/Interspeech.2021-698
Gray, A., Markel, J.: Distance measures for speech processing. IEEE Trans. Acoust. Speech Sig. Process. 24(5), 380–391 (1976)
Article Google Scholar
Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)
Article Google Scholar
Heittola, T., Klapuri, A., Virtanen, T.: Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR, pp. 327–332 (2009)
Google Scholar
Humphrey, E., Durand, S., McFee, B.: OpenMIC-2018: an open data-set for multiple instrument recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR, pp. 438–444 (2018)
Google Scholar
Itakura, F.: Analysis synthesis telephony based on the maximum likelihood method. In: Reports of the 6th International Congress on Acoustics, pp. C17–20 (1968)
Google Scholar
Klapuri, A.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech Audio Process. 11(6), 804–816 (2003)
Article Google Scholar
Koutini, K., Schlüter, J., Eghbal-zadeh, H., Widmer, G.: Efficient training of audio transformers with patchout. In: Proceedings of the 23rd Annual Conference of the International Speech Communication Association, Interspeech, pp. 2753–2757 (2022). https://doi.org/10.21437/Interspeech.2022-227
Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)
Article Google Scholar
Li, X., Wang, K., Soraghan, J., Ren, J.: Fusion of Hilbert-Huang transform and deep convolutional neural network for predominant musical instruments recognition. In: Romero, J., Ekárt, A., Martins, T., Correia, J. (eds.) EvoMUSART 2020. LNCS, vol. 12103, pp. 80–89. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43859-3_6
Chapter Google Scholar
Livshin, A., Rodet, X.: The significance of the non-harmonic “noise” versus the harmonic series for musical instrument recognition. In: Proceedings of the 7th International Conference on Music Information Retrieval, ISMIR, pp. 95–100 (2006)
Google Scholar
Manilow, E., Wichern, G., Seetharaman, P., Le Roux, J.: Cutting music source separation some Slakh: a dataset to study the impact of training data quality and quantity. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA (2019)
Google Scholar
Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. Cambridge research laboratory technical report series CRL 4, 143 (1999)
Google Scholar
Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 135–140 (2010)
Google Scholar
McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)
Google Scholar
Müller, M., Ewert, S.: Towards timbre-invariant audio features for harmony-based music. IEEE Trans. Audio Speech Lang. Process. 18(3), 649–662 (2010)
Article Google Scholar
Instruments, N.: Komplete 11 Ultimate. Native Instruments North America Inc., Los Angeles (2016)
Google Scholar
Ostermann, F., Vatolkin, I., Ebeling, M.: AAM: a dataset of artificial audio multitracks for diverse music information retrieval tasks. EURASIP J. Audio Speech Music Process. 2023(1), 13 (2023). https://doi.org/10.1186/s13636-023-00278-7
Article Google Scholar
Schmid, F., Koutini, K., Widmer, G.: Efficient large-scale audio tagging via transformer-to-CNN knowledge distillation. In: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096110
Singh, S., Wang, R., Qiu, Y.: DeepF0: end-to-end fundamental frequency estimation for music and speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 61–65 (2021)
Google Scholar
Vatolkin, I.: Evolutionary approximation of instrumental texture in polyphonic audio recordings. In: Proceedings of the 2020 IEEE Congress on Evolutionary Computation, CEC, pp. 1–8 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for Artificial Intelligence Methodology, RWTH Aachen University, Aachen, Germany
Justin Dettmer & Igor Vatolkin
Institut für Neuroinformatik, Ruhr-University Bochum, Bochum, Germany
Tobias Glasmachers

Authors

Justin Dettmer
View author publications
You can also search for this author in PubMed Google Scholar
Igor Vatolkin
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Glasmachers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Justin Dettmer .

Editor information

Editors and Affiliations

University of Nottingham, Nottingham, UK
Colin Johnson
University of Coimbra, Coimbra, Portugal
Sérgio M. Rebelo
University of Coruña, Coruña, Spain
Iria Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dettmer, J., Vatolkin, I., Glasmachers, T. (2024). Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music. In: Johnson, C., Rebelo, S.M., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2024. Lecture Notes in Computer Science, vol 14633. Springer, Cham. https://doi.org/10.1007/978-3-031-56992-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-56992-0_8
Published: 29 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56991-3
Online ISBN: 978-3-031-56992-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music