Skip to main content

Predominant Instrument Recognition in Polyphonic Music Using Convolutional Recurrent Neural Networks

  • Conference paper
  • First Online:
Music in the AI Era (CMMR 2021)

Abstract

Automatic identification of lead instruments is a challenging task in the field of music information retrieval (MIR). In this paper, predominant instrument recognition in polyphonic music is addressed using convolutional recurrent neural networks (CRNN) through Mel-spectrogram, modgdgram, and its fusion. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. Convolutional neural networks (CNN) learn the distinctive local characteristics from the visual representation and recurrent neural networks (RNN) integrate the extracted features over time and classify the instrument to the group to which it belongs. The proposed system is systematically evaluated using the IRMAS dataset. A wave-generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. We experimented with two CRNN architectures, convolutional long short-term memory (C-LSTM) and convolutional gated recurring unit (C-GRU). The fusion experiment C-GRU reports a micro and macro F1 score of 0.69 and 0.60, respectively. These metrics are 7.81% and 9.09% higher than those obtained by the state-of-the-art Han’s model. The architectural choice of CRNN with score-level fusion on Mel-spectro/modgd-gram has merit in recognizing the predominant instrument in polyphonic music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://librosa.org/doc/latest/tutorial.html.

  2. 2.

    https://github.com/Veleslavia/EUSIPCO2017.

References

  1. Ajayakumar, R., Rajan, R.: Predominant instrument recognition in polyphonic music using GMM-DNN framework. In: Proceedings of International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2020)

    Google Scholar 

  2. Aleksandr, D., Rajan, P., Heittola, T., Virtanen, T.: Modified group delay feature for musical instrument recognition. In: Proceedings of International Sympsium on Computer Music Multidisciplinary Research, pp. 431–438 (2013)

    Google Scholar 

  3. Atkar, G., Jayaraju, P.: Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia. Neural Comput. Appl. 33, 1–10 (2021)

    Article  Google Scholar 

  4. Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432 (2015)

  5. Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: Proceedings of 13th International Society for Music Information Retrieval Conference (ISMIR) (2012)

    Google Scholar 

  6. Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)

    Article  Google Scholar 

  7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  8. Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396 (2017)

    Google Scholar 

  9. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014 (2014)

    Google Scholar 

  10. Cui, Z., Ke, R., Pu, Z., Wang, Y.: Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143 (2018)

  11. Donahue, C., McAuley, J., Puckette, M.: Adversarial audio synthesis. In: Proceedimgs of International Conference on Learning Representations, pp. 1–16 (2019)

    Google Scholar 

  12. Fuhrmann, F., Herrera, P.: Polyphonic instrument recognition for exploring semantic similarities in music. In: Proceedings of 13th International Conference on Digital Audio Effects DAFx10, vol. 14, no. 1, pp. 1–8. Graz (2010)

    Google Scholar 

  13. Fuhrmann, F., et al.: Automatic musical instrument recognition from polyphonic music audio signals. Ph.D. thesis, Universitat Pompeu Fabra (2012)

    Google Scholar 

  14. Gimeno, P., Viñals, I., Ortega, A., Miguel, A., Lleida, E.: Multiclass audio segmentation based on recurrent neural networks for broadcast domain data. EURASIP J. Audio Speech Music Process. 2020(1), 1–19 (2020). https://doi.org/10.1186/s13636-020-00172-6

    Article  Google Scholar 

  15. Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: Proceedings of International Society for Music Information Retrieval (ISMIR), pp. 577–584 (2018)

    Google Scholar 

  16. Gruber, N., Jockisch, A.: Are GRU cells more specific and LSTM cells more sensitive in motive classification of text? Front. Artif. Intell. 3, 40 (2020)

    Article  Google Scholar 

  17. Gururani, S., Summers, C., Lerch, A.: Instrument activity detection in polyphonic music using deep neural networks. In: Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 577–584 (2018)

    Google Scholar 

  18. Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)

    Article  Google Scholar 

  19. Heittola, T., Klapuri, A., Virtanen, T.: Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In: Proceedings of International Society of Music Information Retrieval Conference, pp. 327–332 (2009)

    Google Scholar 

  20. Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrument identification in polyphonic music: feature weighting to minimize influence of sound overlaps. EURASIP J. Appl. Signal Process. 2007, 155–175 (2007)

    MATH  Google Scholar 

  21. Kratimenos, A., Avramidis, K., Garoufis, C., Zlatintsi, A., Maragos, P.: Augmentation methods on monophonic audio for instrument classification in polyphonic music. In: Proceedings of 28th European Signal Processing Conference (EUSIPCO), pp. 156–160 (2021)

    Google Scholar 

  22. Kumar, P.M., Sebastian, J., Murthy, H.A.: Musical onset detection on carnatic percussion instruments. In: 2015 Twenty First National Conference on Communications (NCC), pp. 1–6 (2015)

    Google Scholar 

  23. Li, P., Qian, J., Wang, T.: Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv:1511.05520 (2015)

  24. Li, X., Wang, K., Soraghan, J., Ren, J.: Fusion of hilbert-huang transform and deep convolutional neural network for predominant musical instruments recognition. In: Proceedings of 9th International conference on Artificial Intelligence in Music, Sound, Art and Design (2020)

    Google Scholar 

  25. Murthy, H.A., Yegnanarayana, B.: Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)

    Article  Google Scholar 

  26. Nasrullah, Z., Zhao, Y.: Music artist classification with convolutional recurrent neural networks. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)

    Google Scholar 

  27. O’shaughnessy, D.: Speech Communication: Human and Machine, pp. 1–5. Universities press, Hyderabad (1987)

    Google Scholar 

  28. Pons, J., Slizovskaia, O., Gong, R., Gómez, E., Serra, X.: Timbre analysis of music audio signals with convolutional neural networks. In: Proceedings of 25th European Signal Processing Conference (EUSIPCO), pp. 2744–2748 (2017)

    Google Scholar 

  29. Racharla, K., Kumar, V., Jayant, C.B., Khairkar, A., Harish, P.: Predominant musical instrument classification based on spectral features. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 617–622. IEEE (2020)

    Google Scholar 

  30. Rajan, R., Murthy, H.A.: Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)

    Article  Google Scholar 

  31. Rajan, R., Murthy, H.A.: Group delay based melody monopitch extraction from music. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICAASP), pp. 186–190 (2013)

    Google Scholar 

  32. Rajan, R., Murthy, H.A.: Music genre classification by fusion of modified group delay and melodic features. In: Proceedings of Twenty-third National Conference on Communications (NCC), pp. 1–6 (2017)

    Google Scholar 

  33. Rajesh, S., Nalini, N.: Musical instrument emotion recognition using deep recurrent neural network. Procedia Comput. Sci. 167, 16–25 (2020)

    Article  Google Scholar 

  34. Reghunath, L.C., Rajan, R.: Attention-based predominant instruments recognition in polyphonic music. In: Proceedings of 18th Sound and Music Computing Conference (SMC), pp. 199–206 (2021)

    Google Scholar 

  35. Reghunath, L.C., Rajan, R.: Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music. EURASIP J. Audio Speech Music Process. 2022(1), 1–14 (2022)

    Article  Google Scholar 

  36. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  37. Toh, K., Jiang, X., Yau, W.: Exploiting global and local decisions for multimodal biometrics verification. IEEE Trans. Signal Process. 52, 3059–3072 (2004)

    Article  Google Scholar 

  38. Wang, Y., Tan, T., Jain, A.K.: Combining face and iris biometrics for identity verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 805–813. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44887-X_93

    Chapter  Google Scholar 

  39. Yu, D., Duan, H., Fang, J., Zeng, B.: Predominant instrument recognition based on deep neural network with auxiliary classification. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28, 852–861 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. R. Lekshmi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lekshmi, C.R., Rajan, R. (2023). Predominant Instrument Recognition in Polyphonic Music Using Convolutional Recurrent Neural Networks. In: Aramaki, M., Hirata, K., Kitahara, T., Kronland-Martinet, R., Ystad, S. (eds) Music in the AI Era. CMMR 2021. Lecture Notes in Computer Science, vol 13770 . Springer, Cham. https://doi.org/10.1007/978-3-031-35382-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35382-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35381-9

  • Online ISBN: 978-3-031-35382-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics