Abstract
Current state-of-the-art methods for instrument and pitch detection in polyphonic music often require large datasets and long training times; resources which are sparse in the field of music information retrieval, presenting a need for unsupervised alternative methods that do not require such prerequisites. We present a modification to an evolutionary algorithm for polyphonic music approximation through synthesis that uses spectral information to initialise populations with probable pitches. This algorithm can perform joint instrument and pitch detection on polyphonic music pieces without any of the aforementioned constraints. Sets of tuples of (instrument, style, pitch) are graded with a COSH distance fitness function and finally determine the algorithm’s instrument and pitch labels for a given part of a music piece. Further investigation into this fitness function indicates that it tends to create false positives which may conceal the true potential of our modified approach. Regardless of that, our modification still shows significantly faster convergence speed and slightly improved pitch and instrument detection errors over the baseline algorithm on both single onset and full piece experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Bansal, M., Sircar, P.: Parametric representation of voiced speech phoneme using multicomponent am signal model. In: Proceedings of the 2018 IEEE/ACIS 17th International Conference on Computer and Information Science, ICIS, pp. 128–133 (2018)
Benetos, E., Dixon, S., Duan, Z., Ewert, S.: Automatic music transcription: an overview. IEEE Sig. Process. Mag. 36(1), 20–30 (2019)
Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR, pp. 155–160 (2014)
Brown, J.C., Houix, O., McAdams, S.: Feature dependence in the automatic identification of musical woodwind instruments. J. Acoust. Soc. Am. 109(3), 1064–1072 (2001)
Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Am. 92(5), 2698–2701 (1992)
Eerola, T., Ferrer, R.: Instrument library (MUMS) revised. Music. Percept. 25(3), 253–255 (2008)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series, 2nd edn. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20, 55:1–55:21 (2019)
Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: Proceedings of the 13th International Conference on Digital Audio Effects, DAFx, pp. 1–4 (2010)
Fricke, L., Vatolkin, I., Ostermann, F.: Application of neural architecture search to instrument recognition in polyphonic audio. In: Johnson, C., Rodríguez-Fernández, N., Rebelo, S.M. (eds.) EvoMUSART 2023. LNCS, vol. 13988, pp. 117–131. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-29956-8_8
George, E.B., Smith, M.J.: Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones. J. Audio Eng. Soc. 40(6), 497–516 (1992)
Ginsel, P.: Abstandsmaße zur evolutionären Klangapproximation auf Audiodaten. Master’s thesis, TU Dortmund University, Department of Computer Science (2021)
Gong, Y., Chung, Y.A., Glass, J.: AST: audio spectrogram transformer. In: Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech, pp. 571–575 (2021). https://doi.org/10.21437/Interspeech.2021-698
Gray, A., Markel, J.: Distance measures for speech processing. IEEE Trans. Acoust. Speech Sig. Process. 24(5), 380–391 (1976)
Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)
Heittola, T., Klapuri, A., Virtanen, T.: Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR, pp. 327–332 (2009)
Humphrey, E., Durand, S., McFee, B.: OpenMIC-2018: an open data-set for multiple instrument recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR, pp. 438–444 (2018)
Itakura, F.: Analysis synthesis telephony based on the maximum likelihood method. In: Reports of the 6th International Congress on Acoustics, pp. C17–20 (1968)
Klapuri, A.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech Audio Process. 11(6), 804–816 (2003)
Koutini, K., Schlüter, J., Eghbal-zadeh, H., Widmer, G.: Efficient training of audio transformers with patchout. In: Proceedings of the 23rd Annual Conference of the International Speech Communication Association, Interspeech, pp. 2753–2757 (2022). https://doi.org/10.21437/Interspeech.2022-227
Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)
Li, X., Wang, K., Soraghan, J., Ren, J.: Fusion of Hilbert-Huang transform and deep convolutional neural network for predominant musical instruments recognition. In: Romero, J., Ekárt, A., Martins, T., Correia, J. (eds.) EvoMUSART 2020. LNCS, vol. 12103, pp. 80–89. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43859-3_6
Livshin, A., Rodet, X.: The significance of the non-harmonic “noise” versus the harmonic series for musical instrument recognition. In: Proceedings of the 7th International Conference on Music Information Retrieval, ISMIR, pp. 95–100 (2006)
Manilow, E., Wichern, G., Seetharaman, P., Le Roux, J.: Cutting music source separation some Slakh: a dataset to study the impact of training data quality and quantity. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA (2019)
Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. Cambridge research laboratory technical report series CRL 4, 143 (1999)
Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 135–140 (2010)
McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)
Müller, M., Ewert, S.: Towards timbre-invariant audio features for harmony-based music. IEEE Trans. Audio Speech Lang. Process. 18(3), 649–662 (2010)
Instruments, N.: Komplete 11 Ultimate. Native Instruments North America Inc., Los Angeles (2016)
Ostermann, F., Vatolkin, I., Ebeling, M.: AAM: a dataset of artificial audio multitracks for diverse music information retrieval tasks. EURASIP J. Audio Speech Music Process. 2023(1), 13 (2023). https://doi.org/10.1186/s13636-023-00278-7
Schmid, F., Koutini, K., Widmer, G.: Efficient large-scale audio tagging via transformer-to-CNN knowledge distillation. In: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096110
Singh, S., Wang, R., Qiu, Y.: DeepF0: end-to-end fundamental frequency estimation for music and speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 61–65 (2021)
Vatolkin, I.: Evolutionary approximation of instrumental texture in polyphonic audio recordings. In: Proceedings of the 2020 IEEE Congress on Evolutionary Computation, CEC, pp. 1–8 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dettmer, J., Vatolkin, I., Glasmachers, T. (2024). Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music. In: Johnson, C., Rebelo, S.M., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2024. Lecture Notes in Computer Science, vol 14633. Springer, Cham. https://doi.org/10.1007/978-3-031-56992-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-56992-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56991-3
Online ISBN: 978-3-031-56992-0
eBook Packages: Computer ScienceComputer Science (R0)