Abstract
Automatic identification of lead instruments is a challenging task in the field of music information retrieval (MIR). In this paper, predominant instrument recognition in polyphonic music is addressed using convolutional recurrent neural networks (CRNN) through Mel-spectrogram, modgdgram, and its fusion. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. Convolutional neural networks (CNN) learn the distinctive local characteristics from the visual representation and recurrent neural networks (RNN) integrate the extracted features over time and classify the instrument to the group to which it belongs. The proposed system is systematically evaluated using the IRMAS dataset. A wave-generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. We experimented with two CRNN architectures, convolutional long short-term memory (C-LSTM) and convolutional gated recurring unit (C-GRU). The fusion experiment C-GRU reports a micro and macro F1 score of 0.69 and 0.60, respectively. These metrics are 7.81% and 9.09% higher than those obtained by the state-of-the-art Han’s model. The architectural choice of CRNN with score-level fusion on Mel-spectro/modgd-gram has merit in recognizing the predominant instrument in polyphonic music.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ajayakumar, R., Rajan, R.: Predominant instrument recognition in polyphonic music using GMM-DNN framework. In: Proceedings of International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2020)
Aleksandr, D., Rajan, P., Heittola, T., Virtanen, T.: Modified group delay feature for musical instrument recognition. In: Proceedings of International Sympsium on Computer Music Multidisciplinary Research, pp. 431–438 (2013)
Atkar, G., Jayaraju, P.: Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia. Neural Comput. Appl. 33, 1–10 (2021)
Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432 (2015)
Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: Proceedings of 13th International Society for Music Information Retrieval Conference (ISMIR) (2012)
Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396 (2017)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014 (2014)
Cui, Z., Ke, R., Pu, Z., Wang, Y.: Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143 (2018)
Donahue, C., McAuley, J., Puckette, M.: Adversarial audio synthesis. In: Proceedimgs of International Conference on Learning Representations, pp. 1–16 (2019)
Fuhrmann, F., Herrera, P.: Polyphonic instrument recognition for exploring semantic similarities in music. In: Proceedings of 13th International Conference on Digital Audio Effects DAFx10, vol. 14, no. 1, pp. 1–8. Graz (2010)
Fuhrmann, F., et al.: Automatic musical instrument recognition from polyphonic music audio signals. Ph.D. thesis, Universitat Pompeu Fabra (2012)
Gimeno, P., Viñals, I., Ortega, A., Miguel, A., Lleida, E.: Multiclass audio segmentation based on recurrent neural networks for broadcast domain data. EURASIP J. Audio Speech Music Process. 2020(1), 1–19 (2020). https://doi.org/10.1186/s13636-020-00172-6
Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: Proceedings of International Society for Music Information Retrieval (ISMIR), pp. 577–584 (2018)
Gruber, N., Jockisch, A.: Are GRU cells more specific and LSTM cells more sensitive in motive classification of text? Front. Artif. Intell. 3, 40 (2020)
Gururani, S., Summers, C., Lerch, A.: Instrument activity detection in polyphonic music using deep neural networks. In: Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 577–584 (2018)
Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)
Heittola, T., Klapuri, A., Virtanen, T.: Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In: Proceedings of International Society of Music Information Retrieval Conference, pp. 327–332 (2009)
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrument identification in polyphonic music: feature weighting to minimize influence of sound overlaps. EURASIP J. Appl. Signal Process. 2007, 155–175 (2007)
Kratimenos, A., Avramidis, K., Garoufis, C., Zlatintsi, A., Maragos, P.: Augmentation methods on monophonic audio for instrument classification in polyphonic music. In: Proceedings of 28th European Signal Processing Conference (EUSIPCO), pp. 156–160 (2021)
Kumar, P.M., Sebastian, J., Murthy, H.A.: Musical onset detection on carnatic percussion instruments. In: 2015 Twenty First National Conference on Communications (NCC), pp. 1–6 (2015)
Li, P., Qian, J., Wang, T.: Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv:1511.05520 (2015)
Li, X., Wang, K., Soraghan, J., Ren, J.: Fusion of hilbert-huang transform and deep convolutional neural network for predominant musical instruments recognition. In: Proceedings of 9th International conference on Artificial Intelligence in Music, Sound, Art and Design (2020)
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)
Nasrullah, Z., Zhao, Y.: Music artist classification with convolutional recurrent neural networks. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
O’shaughnessy, D.: Speech Communication: Human and Machine, pp. 1–5. Universities press, Hyderabad (1987)
Pons, J., Slizovskaia, O., Gong, R., Gómez, E., Serra, X.: Timbre analysis of music audio signals with convolutional neural networks. In: Proceedings of 25th European Signal Processing Conference (EUSIPCO), pp. 2744–2748 (2017)
Racharla, K., Kumar, V., Jayant, C.B., Khairkar, A., Harish, P.: Predominant musical instrument classification based on spectral features. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 617–622. IEEE (2020)
Rajan, R., Murthy, H.A.: Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)
Rajan, R., Murthy, H.A.: Group delay based melody monopitch extraction from music. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICAASP), pp. 186–190 (2013)
Rajan, R., Murthy, H.A.: Music genre classification by fusion of modified group delay and melodic features. In: Proceedings of Twenty-third National Conference on Communications (NCC), pp. 1–6 (2017)
Rajesh, S., Nalini, N.: Musical instrument emotion recognition using deep recurrent neural network. Procedia Comput. Sci. 167, 16–25 (2020)
Reghunath, L.C., Rajan, R.: Attention-based predominant instruments recognition in polyphonic music. In: Proceedings of 18th Sound and Music Computing Conference (SMC), pp. 199–206 (2021)
Reghunath, L.C., Rajan, R.: Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music. EURASIP J. Audio Speech Music Process. 2022(1), 1–14 (2022)
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Toh, K., Jiang, X., Yau, W.: Exploiting global and local decisions for multimodal biometrics verification. IEEE Trans. Signal Process. 52, 3059–3072 (2004)
Wang, Y., Tan, T., Jain, A.K.: Combining face and iris biometrics for identity verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 805–813. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44887-X_93
Yu, D., Duan, H., Fang, J., Zeng, B.: Predominant instrument recognition based on deep neural network with auxiliary classification. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28, 852–861 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Lekshmi, C.R., Rajan, R. (2023). Predominant Instrument Recognition in Polyphonic Music Using Convolutional Recurrent Neural Networks. In: Aramaki, M., Hirata, K., Kitahara, T., Kronland-Martinet, R., Ystad, S. (eds) Music in the AI Era. CMMR 2021. Lecture Notes in Computer Science, vol 13770 . Springer, Cham. https://doi.org/10.1007/978-3-031-35382-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-35382-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35381-9
Online ISBN: 978-3-031-35382-6
eBook Packages: Computer ScienceComputer Science (R0)