Skip to main content

Advertisement

Log in

An ensemble of deep transfer learning models for handwritten music symbol recognition

  • S. I. : Effective and Efficient Deep Learning
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In ancient times, there was no system to record or document music. A basic notation system to write European music was formulated around 14th century in the Baroque period which slowly evolved into the standard notation system that we have today. Later, the musical pieces from the classical and post-classical period of European music were documented as scores using this standard European staff notations. These notations are used by most of the modern genres of music due to their versatility. Hence, it is very important to develop a method that can store such music sheets containing handwritten music scores digitally. Optical music recognition (OMR) is a system that automatically interprets the scanned handwritten music scores. In this work, we have proposed a classifier ensemble of deep transfer learning models with support vector machine (SVM) as the aggregator for handwritten music symbol recognition. We have applied three pre-trained deep learning models, namely ResNet50, GoogleNet and DenseNet161 (each trained on ImageNet), and fine-tuned on our target datasets i.e., music symbol image datasets. The proposed ensemble technique can capture a more complex association of the base classifiers, thus improving the overall performance. We have evaluated the proposed model on five publicly available standard datasets, namely Handwritten Online Music Symbols (HOMUS), Capitan_Score_Uniform, Capitan_Score_Non-uniform, Rebelo_real and Fornés, and achieved state-of-the-art results for all these datasets. Additionally, we have evaluated our model on publicly available two non-music symbols datasets, namely CMATERdb 2.1.2 containing 120 handwritten Bangla city names and CMATERdb 3.1.1 dataset containing handwritten Bangla numerals to validate its effectiveness on diversified datasets. The source code of this present work is available at https://github.com/ashis0013/Music-Symbol-Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://grfia.dlsi.ua.es/homus/.

  2. http://grfia.dlsi.ua.es/musicdocs/Capitan.zip.

  3. https://github.com/apacha/OMR-Datasets.

  4. http://www.cvc.uab.es/~afornes/.

  5. https://pytorch.org/vision/stable/models.html.

References

  1. Crocker RL (1963) Pythagorean mathematics and music. J Aesthet Art Crit 22(2):189–198

    Article  Google Scholar 

  2. Strayer HR (2013) From neumes to notes: the evolution of music notation. In: Music and worship student presentations: proceedings of national conference on undergraduate research. Department of Music and Worship, Cedarville University, La Crosse, WI, pp 1–14

    Google Scholar 

  3. Jorgensen ER (2003) Western classical music and general education. Philos Music Educ Rev 11(2):130–140

    Article  Google Scholar 

  4. Calvo-Zaragoza J, Oncina J (2017) Recognition of pen-based music notation with finite-state machines. Expert Syst Appl 72:395–406

    Article  Google Scholar 

  5. Nawade SA, Hangarge M, Dhawale C, Reaz MBI, Pardeshi R, Arsad N (2018) Old handwritten music symbol recognition using directional multi-resolution spatial features. In: 2018 international conference on smart computing and electronic enterprise (ICSCEE). IEEE, pp 1–4

  6. Fornés A, Lladós J, Sánchez G (2007) Old handwritten musical symbol classification by a dynamic time warping based method. In: International workshop on graphics recognition. Springer, pp 51–60

  7. Malakar S, Ghosh M, Chaterjee A, Bhowmik S, Sarkar R (2020) Offline music symbol recognition using Daisy feature and quantum Grey wolf optimization based feature selection. Multimedia Tools Appl 79(43):32011–32036

    Article  Google Scholar 

  8. Mukhoti J, Dutta S, Sarkar R (2020) Handwritten digit classification in Bangla and Hindi using deep learning. Appl Artif Intell 34(14):1074–1099

    Article  Google Scholar 

  9. Chakraborty A, De R, Malakar S, Schwenker F, Sarkar R (2021) Handwritten digit string recognition using deep autoencoder based segmentation and ResNet based recognition approach. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 7737–7742

  10. Gan J, Wang W, Lu K (2020) Compressing the CNN architecture for in-air handwritten Chinese character recognition. Pattern Recogn Lett 129:190–197

    Article  Google Scholar 

  11. Malakar S, Paul S, Kundu S, Bhowmik S, Sarkar R, Nasipuri M (2020) Handwritten word recognition using lottery ticket hypothesis based pruned CNN model: a new benchmark on CMATERdb2. 1.2. Neural Comput Appl 32(18):15209–15220

    Article  Google Scholar 

  12. Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2021) Fuzzy-based pseudo segmentation approach for handwritten word recognition using a sequence to sequence model with attention. In: Recognition Pattern (ed) ICPR international workshops and challenges: virtual event, January 10–15, 2021. Part II, Proceedings. Springer, pp 582–596

  13. Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Marinai S, Fujisawa H (eds) Machine learning in document analysis and recognition. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 361–386. https://doi.org/10.1007/978-3-540-76280-5_14

    Chapter  Google Scholar 

  14. Lee DS, Srihari SN (1995) A theory of classifier combination: the neural network approach. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 42–45

  15. Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: 2014 22nd international conference on pattern recognition. IEEE, pp 3038–3043

  16. George SE (2003) Online pen-based recognition of music notation with artificial neural networks. Comput Music J 27(2):70–79

    Article  Google Scholar 

  17. Lee S, Son SJ, Oh J, Kwak N (2016) Handwritten music symbol classification using deep convolutional neural networks. In: 2016 international conference on information science and security (ICISS). IEEE, pp 1–5

  18. Pacha A, Eidenberger H (2017) Towards self-learning optical music recognition. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 795–800

  19. Oh J, Son SJ, Lee S, Kwon JW, Kwak N (2017) Online recognition of handwritten music symbols. Int J Doc Anal Recogn (IJDAR) 20(2):79–89

    Article  Google Scholar 

  20. Baró A, Riba P, Calvo-Zaragoza J, Fornés A (2019) From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn Lett 123:1–8

    Article  Google Scholar 

  21. Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn Lett 128:115–121

    Article  Google Scholar 

  22. Rico-Juan JR, Valero-Mas JJ, Iñesta JM (2020) Bounding edit distance for similarity-based sequence classification on structural pattern recognition. Appl Soft Comput 97:106778

    Article  Google Scholar 

  23. Calvo-Zaragoza J, Rico-Juan JR, Gallego AJ (2020) Ensemble classification from deep predictions with test data augmentation. Soft Comput 24(2):1423–1433

    Article  Google Scholar 

  24. Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2021) Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Hum Comput 12(7):7997–8008. https://doi.org/10.1007/s12652-020-02528-4

    Article  Google Scholar 

  25. Dey S, Bhattacharya R, Malakar S, Mirjalili S, Sarkar R (2021) Choquet fuzzy integral-based classifier ensemble technique for COVID-19 detection. Comput Biol Med 135. https://doi.org/10.1016/j.compbiomed.2021.104585

    Article  Google Scholar 

  26. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  27. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456

  28. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 315–323

  29. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  30. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  31. Malakar S, Sharma P, Singh PK, Das M, Sarkar R, Nasipuri M (2017) A holistic approach for handwritten Hindi word recognition. Int J Comput Vis Image Process (IJCVIP) 7(1):59–78

    Article  Google Scholar 

  32. Rebelo A, Capela G, Cardoso JS (2010) Optical recognition of music symbols. Int J Doc Anal Recogn (IJDAR) 13(1):19–31

    Article  Google Scholar 

  33. Basha SS, Dubey SR, Pulabaigari V, Mukherjee S (2020) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378:112–119

    Article  Google Scholar 

  34. Mondal R, Malakar S, Barney Smith EH, Sarkar R (2021) Handwritten English word recognition using a deep learning based object detection architecture. Multimedia Tools Appl 1–26. https://doi.org/10.1007/s11042-021-11425-7

  35. Calvo-Zaragoza J, Rizo D, Quereda JMI (2016) Two (note) heads are better than one: pen-based multimodal interaction with music scores. In: ISMIR, pp 509–514

  36. Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR, Iñesta JM (2017) An experimental study on rank methods for prototype selection. Soft Comput 21(19):5703–5715

    Article  Google Scholar 

  37. Kundu S, Paul S, Singh PK, Sarkar R, Nasipuri M (2020) Understanding NFC-Net: a deep learning approach to word-level handwritten Indic script recognition. Neural Comput Appl 32(12):7879–7895

    Article  Google Scholar 

  38. Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  39. Bhowmik S, Malakar S, Sarkar R, Basu S, Kundu M, Nasipuri M (2019) Off-line Bangla handwritten word recognition: a holistic approach. Neural Comput Appl 31(10):5783–5798

    Article  Google Scholar 

  40. Das N, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl Soft Comput 12(5):1592–1606

    Article  Google Scholar 

  41. Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32(7):2533–2552

    Article  Google Scholar 

  42. Pramanik R, Bag S (2021) Handwritten Bangla city name word recognition using CNN-based transfer learning and FCN. Neural Comput Appl 33(15):9329–9341. https://doi.org/10.1007/s00521-021-05693-5

    Article  Google Scholar 

  43. Sarkhel R, Das N, Saha AK, Nasipuri M (2016) A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition. Pattern Recogn 58:172–189

    Article  Google Scholar 

  44. Khan K, Roh B, Ali J, Khan RU, Uddin I, Hassan S et al (2020) PHND: Pashtu handwritten numerals database and deep learning benchmark. PLoS One 15(9):e0238423

    Article  Google Scholar 

  45. Ghosh S, Chatterjee A, Singh PK, Bhowmik S, Sarkar R (2020) Language-invariant novel feature descriptors for handwritten numeral recognition. Vis Comput 37(7):1781–1803. https://doi.org/10.1007/s00371-020-01938-x

    Article  Google Scholar 

  46. Mandal B, Sarkhel R, Ghosh S, Das N, Nasipuri M (2021) Two-phase dynamic routing for micro and macro-level equivariance in multi-column capsule networks. Pattern Recogn 109:107595

    Article  Google Scholar 

Download references

Acknowledgements

We are thankful to the Center for Microprocessor Applications for Training Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing infrastructural support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samir Malakar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paul, A., Pramanik, R., Malakar, S. et al. An ensemble of deep transfer learning models for handwritten music symbol recognition. Neural Comput & Applic 34, 10409–10427 (2022). https://doi.org/10.1007/s00521-021-06629-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06629-9

Keywords