Abstract
In ancient times, there was no system to record or document music. A basic notation system to write European music was formulated around 14th century in the Baroque period which slowly evolved into the standard notation system that we have today. Later, the musical pieces from the classical and post-classical period of European music were documented as scores using this standard European staff notations. These notations are used by most of the modern genres of music due to their versatility. Hence, it is very important to develop a method that can store such music sheets containing handwritten music scores digitally. Optical music recognition (OMR) is a system that automatically interprets the scanned handwritten music scores. In this work, we have proposed a classifier ensemble of deep transfer learning models with support vector machine (SVM) as the aggregator for handwritten music symbol recognition. We have applied three pre-trained deep learning models, namely ResNet50, GoogleNet and DenseNet161 (each trained on ImageNet), and fine-tuned on our target datasets i.e., music symbol image datasets. The proposed ensemble technique can capture a more complex association of the base classifiers, thus improving the overall performance. We have evaluated the proposed model on five publicly available standard datasets, namely Handwritten Online Music Symbols (HOMUS), Capitan_Score_Uniform, Capitan_Score_Non-uniform, Rebelo_real and Fornés, and achieved state-of-the-art results for all these datasets. Additionally, we have evaluated our model on publicly available two non-music symbols datasets, namely CMATERdb 2.1.2 containing 120 handwritten Bangla city names and CMATERdb 3.1.1 dataset containing handwritten Bangla numerals to validate its effectiveness on diversified datasets. The source code of this present work is available at https://github.com/ashis0013/Music-Symbol-Recognition.


















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Crocker RL (1963) Pythagorean mathematics and music. J Aesthet Art Crit 22(2):189–198
Strayer HR (2013) From neumes to notes: the evolution of music notation. In: Music and worship student presentations: proceedings of national conference on undergraduate research. Department of Music and Worship, Cedarville University, La Crosse, WI, pp 1–14
Jorgensen ER (2003) Western classical music and general education. Philos Music Educ Rev 11(2):130–140
Calvo-Zaragoza J, Oncina J (2017) Recognition of pen-based music notation with finite-state machines. Expert Syst Appl 72:395–406
Nawade SA, Hangarge M, Dhawale C, Reaz MBI, Pardeshi R, Arsad N (2018) Old handwritten music symbol recognition using directional multi-resolution spatial features. In: 2018 international conference on smart computing and electronic enterprise (ICSCEE). IEEE, pp 1–4
Fornés A, Lladós J, Sánchez G (2007) Old handwritten musical symbol classification by a dynamic time warping based method. In: International workshop on graphics recognition. Springer, pp 51–60
Malakar S, Ghosh M, Chaterjee A, Bhowmik S, Sarkar R (2020) Offline music symbol recognition using Daisy feature and quantum Grey wolf optimization based feature selection. Multimedia Tools Appl 79(43):32011–32036
Mukhoti J, Dutta S, Sarkar R (2020) Handwritten digit classification in Bangla and Hindi using deep learning. Appl Artif Intell 34(14):1074–1099
Chakraborty A, De R, Malakar S, Schwenker F, Sarkar R (2021) Handwritten digit string recognition using deep autoencoder based segmentation and ResNet based recognition approach. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 7737–7742
Gan J, Wang W, Lu K (2020) Compressing the CNN architecture for in-air handwritten Chinese character recognition. Pattern Recogn Lett 129:190–197
Malakar S, Paul S, Kundu S, Bhowmik S, Sarkar R, Nasipuri M (2020) Handwritten word recognition using lottery ticket hypothesis based pruned CNN model: a new benchmark on CMATERdb2. 1.2. Neural Comput Appl 32(18):15209–15220
Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2021) Fuzzy-based pseudo segmentation approach for handwritten word recognition using a sequence to sequence model with attention. In: Recognition Pattern (ed) ICPR international workshops and challenges: virtual event, January 10–15, 2021. Part II, Proceedings. Springer, pp 582–596
Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Marinai S, Fujisawa H (eds) Machine learning in document analysis and recognition. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 361–386. https://doi.org/10.1007/978-3-540-76280-5_14
Lee DS, Srihari SN (1995) A theory of classifier combination: the neural network approach. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 42–45
Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: 2014 22nd international conference on pattern recognition. IEEE, pp 3038–3043
George SE (2003) Online pen-based recognition of music notation with artificial neural networks. Comput Music J 27(2):70–79
Lee S, Son SJ, Oh J, Kwak N (2016) Handwritten music symbol classification using deep convolutional neural networks. In: 2016 international conference on information science and security (ICISS). IEEE, pp 1–5
Pacha A, Eidenberger H (2017) Towards self-learning optical music recognition. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 795–800
Oh J, Son SJ, Lee S, Kwon JW, Kwak N (2017) Online recognition of handwritten music symbols. Int J Doc Anal Recogn (IJDAR) 20(2):79–89
Baró A, Riba P, Calvo-Zaragoza J, Fornés A (2019) From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn Lett 123:1–8
Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn Lett 128:115–121
Rico-Juan JR, Valero-Mas JJ, Iñesta JM (2020) Bounding edit distance for similarity-based sequence classification on structural pattern recognition. Appl Soft Comput 97:106778
Calvo-Zaragoza J, Rico-Juan JR, Gallego AJ (2020) Ensemble classification from deep predictions with test data augmentation. Soft Comput 24(2):1423–1433
Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2021) Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Hum Comput 12(7):7997–8008. https://doi.org/10.1007/s12652-020-02528-4
Dey S, Bhattacharya R, Malakar S, Mirjalili S, Sarkar R (2021) Choquet fuzzy integral-based classifier ensemble technique for COVID-19 detection. Comput Biol Med 135. https://doi.org/10.1016/j.compbiomed.2021.104585
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 315–323
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Malakar S, Sharma P, Singh PK, Das M, Sarkar R, Nasipuri M (2017) A holistic approach for handwritten Hindi word recognition. Int J Comput Vis Image Process (IJCVIP) 7(1):59–78
Rebelo A, Capela G, Cardoso JS (2010) Optical recognition of music symbols. Int J Doc Anal Recogn (IJDAR) 13(1):19–31
Basha SS, Dubey SR, Pulabaigari V, Mukherjee S (2020) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378:112–119
Mondal R, Malakar S, Barney Smith EH, Sarkar R (2021) Handwritten English word recognition using a deep learning based object detection architecture. Multimedia Tools Appl 1–26. https://doi.org/10.1007/s11042-021-11425-7
Calvo-Zaragoza J, Rizo D, Quereda JMI (2016) Two (note) heads are better than one: pen-based multimodal interaction with music scores. In: ISMIR, pp 509–514
Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR, Iñesta JM (2017) An experimental study on rank methods for prototype selection. Soft Comput 21(19):5703–5715
Kundu S, Paul S, Singh PK, Sarkar R, Nasipuri M (2020) Understanding NFC-Net: a deep learning approach to word-level handwritten Indic script recognition. Neural Comput Appl 32(12):7879–7895
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Bhowmik S, Malakar S, Sarkar R, Basu S, Kundu M, Nasipuri M (2019) Off-line Bangla handwritten word recognition: a holistic approach. Neural Comput Appl 31(10):5783–5798
Das N, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl Soft Comput 12(5):1592–1606
Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32(7):2533–2552
Pramanik R, Bag S (2021) Handwritten Bangla city name word recognition using CNN-based transfer learning and FCN. Neural Comput Appl 33(15):9329–9341. https://doi.org/10.1007/s00521-021-05693-5
Sarkhel R, Das N, Saha AK, Nasipuri M (2016) A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition. Pattern Recogn 58:172–189
Khan K, Roh B, Ali J, Khan RU, Uddin I, Hassan S et al (2020) PHND: Pashtu handwritten numerals database and deep learning benchmark. PLoS One 15(9):e0238423
Ghosh S, Chatterjee A, Singh PK, Bhowmik S, Sarkar R (2020) Language-invariant novel feature descriptors for handwritten numeral recognition. Vis Comput 37(7):1781–1803. https://doi.org/10.1007/s00371-020-01938-x
Mandal B, Sarkhel R, Ghosh S, Das N, Nasipuri M (2021) Two-phase dynamic routing for micro and macro-level equivariance in multi-column capsule networks. Pattern Recogn 109:107595
Acknowledgements
We are thankful to the Center for Microprocessor Applications for Training Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing infrastructural support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Paul, A., Pramanik, R., Malakar, S. et al. An ensemble of deep transfer learning models for handwritten music symbol recognition. Neural Comput & Applic 34, 10409–10427 (2022). https://doi.org/10.1007/s00521-021-06629-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06629-9