An ensemble of deep transfer learning models for handwritten music symbol recognition

Paul, Ashis; Pramanik, Rishav; Malakar, Samir; Sarkar, Ram

doi:10.1007/s00521-021-06629-9

An ensemble of deep transfer learning models for handwritten music symbol recognition

S. I. : Effective and Efficient Deep Learning
Published: 09 November 2021

Volume 34, pages 10409–10427, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1152 Accesses
1 Altmetric
Explore all metrics

Abstract

In ancient times, there was no system to record or document music. A basic notation system to write European music was formulated around 14th century in the Baroque period which slowly evolved into the standard notation system that we have today. Later, the musical pieces from the classical and post-classical period of European music were documented as scores using this standard European staff notations. These notations are used by most of the modern genres of music due to their versatility. Hence, it is very important to develop a method that can store such music sheets containing handwritten music scores digitally. Optical music recognition (OMR) is a system that automatically interprets the scanned handwritten music scores. In this work, we have proposed a classifier ensemble of deep transfer learning models with support vector machine (SVM) as the aggregator for handwritten music symbol recognition. We have applied three pre-trained deep learning models, namely ResNet50, GoogleNet and DenseNet161 (each trained on ImageNet), and fine-tuned on our target datasets i.e., music symbol image datasets. The proposed ensemble technique can capture a more complex association of the base classifiers, thus improving the overall performance. We have evaluated the proposed model on five publicly available standard datasets, namely Handwritten Online Music Symbols (HOMUS), Capitan_Score_Uniform, Capitan_Score_Non-uniform, Rebelo_real and Fornés, and achieved state-of-the-art results for all these datasets. Additionally, we have evaluated our model on publicly available two non-music symbols datasets, namely CMATERdb 2.1.2 containing 120 handwritten Bangla city names and CMATERdb 3.1.1 dataset containing handwritten Bangla numerals to validate its effectiveness on diversified datasets. The source code of this present work is available at https://github.com/ashis0013/Music-Symbol-Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores

Few-Shot Music Symbol Classification via Self-Supervised Learning and Nearest Neighbor

Old Handwritten Music Symbol Recognition Using Radon and Discrete Wavelet Transform

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Crocker RL (1963) Pythagorean mathematics and music. J Aesthet Art Crit 22(2):189–198
Article Google Scholar
Strayer HR (2013) From neumes to notes: the evolution of music notation. In: Music and worship student presentations: proceedings of national conference on undergraduate research. Department of Music and Worship, Cedarville University, La Crosse, WI, pp 1–14
Google Scholar
Jorgensen ER (2003) Western classical music and general education. Philos Music Educ Rev 11(2):130–140
Article Google Scholar
Calvo-Zaragoza J, Oncina J (2017) Recognition of pen-based music notation with finite-state machines. Expert Syst Appl 72:395–406
Article Google Scholar
Nawade SA, Hangarge M, Dhawale C, Reaz MBI, Pardeshi R, Arsad N (2018) Old handwritten music symbol recognition using directional multi-resolution spatial features. In: 2018 international conference on smart computing and electronic enterprise (ICSCEE). IEEE, pp 1–4
Fornés A, Lladós J, Sánchez G (2007) Old handwritten musical symbol classification by a dynamic time warping based method. In: International workshop on graphics recognition. Springer, pp 51–60
Malakar S, Ghosh M, Chaterjee A, Bhowmik S, Sarkar R (2020) Offline music symbol recognition using Daisy feature and quantum Grey wolf optimization based feature selection. Multimedia Tools Appl 79(43):32011–32036
Article Google Scholar
Mukhoti J, Dutta S, Sarkar R (2020) Handwritten digit classification in Bangla and Hindi using deep learning. Appl Artif Intell 34(14):1074–1099
Article Google Scholar
Chakraborty A, De R, Malakar S, Schwenker F, Sarkar R (2021) Handwritten digit string recognition using deep autoencoder based segmentation and ResNet based recognition approach. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 7737–7742
Gan J, Wang W, Lu K (2020) Compressing the CNN architecture for in-air handwritten Chinese character recognition. Pattern Recogn Lett 129:190–197
Article Google Scholar
Malakar S, Paul S, Kundu S, Bhowmik S, Sarkar R, Nasipuri M (2020) Handwritten word recognition using lottery ticket hypothesis based pruned CNN model: a new benchmark on CMATERdb2. 1.2. Neural Comput Appl 32(18):15209–15220
Article Google Scholar
Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2021) Fuzzy-based pseudo segmentation approach for handwritten word recognition using a sequence to sequence model with attention. In: Recognition Pattern (ed) ICPR international workshops and challenges: virtual event, January 10–15, 2021. Part II, Proceedings. Springer, pp 582–596
Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Marinai S, Fujisawa H (eds) Machine learning in document analysis and recognition. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 361–386. https://doi.org/10.1007/978-3-540-76280-5_14
Chapter Google Scholar
Lee DS, Srihari SN (1995) A theory of classifier combination: the neural network approach. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 42–45
Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: 2014 22nd international conference on pattern recognition. IEEE, pp 3038–3043
George SE (2003) Online pen-based recognition of music notation with artificial neural networks. Comput Music J 27(2):70–79
Article Google Scholar
Lee S, Son SJ, Oh J, Kwak N (2016) Handwritten music symbol classification using deep convolutional neural networks. In: 2016 international conference on information science and security (ICISS). IEEE, pp 1–5
Pacha A, Eidenberger H (2017) Towards self-learning optical music recognition. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 795–800
Oh J, Son SJ, Lee S, Kwon JW, Kwak N (2017) Online recognition of handwritten music symbols. Int J Doc Anal Recogn (IJDAR) 20(2):79–89
Article Google Scholar
Baró A, Riba P, Calvo-Zaragoza J, Fornés A (2019) From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn Lett 123:1–8
Article Google Scholar
Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn Lett 128:115–121
Article Google Scholar
Rico-Juan JR, Valero-Mas JJ, Iñesta JM (2020) Bounding edit distance for similarity-based sequence classification on structural pattern recognition. Appl Soft Comput 97:106778
Article Google Scholar
Calvo-Zaragoza J, Rico-Juan JR, Gallego AJ (2020) Ensemble classification from deep predictions with test data augmentation. Soft Comput 24(2):1423–1433
Article Google Scholar
Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2021) Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Hum Comput 12(7):7997–8008. https://doi.org/10.1007/s12652-020-02528-4
Article Google Scholar
Dey S, Bhattacharya R, Malakar S, Mirjalili S, Sarkar R (2021) Choquet fuzzy integral-based classifier ensemble technique for COVID-19 detection. Comput Biol Med 135. https://doi.org/10.1016/j.compbiomed.2021.104585
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 315–323
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Malakar S, Sharma P, Singh PK, Das M, Sarkar R, Nasipuri M (2017) A holistic approach for handwritten Hindi word recognition. Int J Comput Vis Image Process (IJCVIP) 7(1):59–78
Article Google Scholar
Rebelo A, Capela G, Cardoso JS (2010) Optical recognition of music symbols. Int J Doc Anal Recogn (IJDAR) 13(1):19–31
Article Google Scholar
Basha SS, Dubey SR, Pulabaigari V, Mukherjee S (2020) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378:112–119
Article Google Scholar
Mondal R, Malakar S, Barney Smith EH, Sarkar R (2021) Handwritten English word recognition using a deep learning based object detection architecture. Multimedia Tools Appl 1–26. https://doi.org/10.1007/s11042-021-11425-7
Calvo-Zaragoza J, Rizo D, Quereda JMI (2016) Two (note) heads are better than one: pen-based multimodal interaction with music scores. In: ISMIR, pp 509–514
Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR, Iñesta JM (2017) An experimental study on rank methods for prototype selection. Soft Comput 21(19):5703–5715
Article Google Scholar
Kundu S, Paul S, Singh PK, Sarkar R, Nasipuri M (2020) Understanding NFC-Net: a deep learning approach to word-level handwritten Indic script recognition. Neural Comput Appl 32(12):7879–7895
Article Google Scholar
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article Google Scholar
Bhowmik S, Malakar S, Sarkar R, Basu S, Kundu M, Nasipuri M (2019) Off-line Bangla handwritten word recognition: a holistic approach. Neural Comput Appl 31(10):5783–5798
Article Google Scholar
Das N, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl Soft Comput 12(5):1592–1606
Article Google Scholar
Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32(7):2533–2552
Article Google Scholar
Pramanik R, Bag S (2021) Handwritten Bangla city name word recognition using CNN-based transfer learning and FCN. Neural Comput Appl 33(15):9329–9341. https://doi.org/10.1007/s00521-021-05693-5
Article Google Scholar
Sarkhel R, Das N, Saha AK, Nasipuri M (2016) A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition. Pattern Recogn 58:172–189
Article Google Scholar
Khan K, Roh B, Ali J, Khan RU, Uddin I, Hassan S et al (2020) PHND: Pashtu handwritten numerals database and deep learning benchmark. PLoS One 15(9):e0238423
Article Google Scholar
Ghosh S, Chatterjee A, Singh PK, Bhowmik S, Sarkar R (2020) Language-invariant novel feature descriptors for handwritten numeral recognition. Vis Comput 37(7):1781–1803. https://doi.org/10.1007/s00371-020-01938-x
Article Google Scholar
Mandal B, Sarkhel R, Ghosh S, Das N, Nasipuri M (2021) Two-phase dynamic routing for micro and macro-level equivariance in multi-column capsule networks. Pattern Recogn 109:107595
Article Google Scholar

Download references

Acknowledgements

We are thankful to the Center for Microprocessor Applications for Training Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing infrastructural support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
Ashis Paul, Rishav Pramanik & Ram Sarkar
Department of Computer Science, Asutosh College, Kolkata, 700026, India
Samir Malakar

Authors

Ashis Paul
View author publications
You can also search for this author inPubMed Google Scholar
Rishav Pramanik
View author publications
You can also search for this author inPubMed Google Scholar
Samir Malakar
View author publications
You can also search for this author inPubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Samir Malakar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paul, A., Pramanik, R., Malakar, S. et al. An ensemble of deep transfer learning models for handwritten music symbol recognition. Neural Comput & Applic 34, 10409–10427 (2022). https://doi.org/10.1007/s00521-021-06629-9

Download citation

Received: 17 April 2021
Accepted: 06 October 2021
Published: 09 November 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00521-021-06629-9

Keywords

Part of a collection:

Special Issue on Effective and Efficient Deep Learning Based Solutions

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble of deep transfer learning models for handwritten music symbol recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores

Few-Shot Music Symbol Classification via Self-Supervised Learning and Nearest Neighbor

Old Handwritten Music Symbol Recognition Using Radon and Discrete Wavelet Transform

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now