Abstract
Gurmukhi script is used to write the official ‘Punjabi’ language of the people of the western part of Indian Punjab. The script is having approximately 160 million native speakers. Recognition of handwritten characters in the Gurmukhi script is still in its embryonic stage due to intricate character shapes and the scarcity of standard datasets. This paper introduces a new large-scale benchmark dataset “Gurmukhi_HWdb1.0” which is an important development in the handwritten character recognition of this script. This dataset has a total of 137,700 handwritten samples of 41 basic Gurmukhi characters and 10 numeral classes. Out of these, 110,160 images are used for training,13,770 images are set aside for validation, and 13,770 images are used for testing. Here, 265 individuals have contributed to the development of the dataset. Recognition of the script is carried out using a CNN architecture based on transfer learning on the VGG16 network. We fine-tuned the model and added our own fully connected layers needed for Gurmukhi characters. The proposed model is executed on this collected “Gurmukhi_HWdb1.0” dataset for evaluation. A detailed comparison with different batch sizes is performed to understand the functionality of the model. Experimental results show that the proposed model can be benchmarked against the concerned dataset with a test accuracy of 98.42% for Gurmukhi characters and 97.51% for Gurmukhi numerals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sharma, R., Kaushik, B.: Offline recognition of handwritten indic scripts: a state-of-the-art survey and future perspectives. Comput. Sci. Rev. 38, 100302 (2020)
Memon, J., Sami, M., Khan, R.A., Uddin, M.: Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access 8, 142642–142668 (2020)
Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 11(1), 1–35 (2012)
Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: Casia online and offline chinese handwriting databases. In: 2011 International Conference on Document Analysis and Recognition, pp. 37–41 (2011). IEEE
Su, T., Zhang, T., Guan, D.: Corpus-based hit-mw database for offline recognition of general-purpose Chinese handwritten text. IJDAR 10(1), 27–38 (2007). https://doi.org/10.1007/s10032-006-0037-6
Marti, U.-V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Grother, P.J.: NIST special database 19. NIST handprinted forms and characters database (2017)
Lawgali, A., Angelova, M., Bouridane, A.: HACDB: handwritten arabic characters database for automatic character recognition. In: European Workshop on Visual Information Processing (EUVIP), pp. 255–259 (2013). IEEE
Mozaffari, S., Faez, K., Faradji, F., Ziaratban, M., Golzan, S.M.: A comprehensive isolated Farsi/Arabic character database for handwritten OCR research. In: Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft (2006)
KIM, D.-H., Hwang, Y.-S., Park, S.-T., Kim, E.-J., Paek, S.-H., BANG, S.-Y.: Handwritten korean character image database pe92. IEICE Trans. Inf. Syst. 79(7), 943–950 (1996)
Bhattacharya, U., Chaudhuri, B.: Databases for research on recognition of handwritten characters of Indian scripts. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 789–793. IEEE (2005)
Bhattacharya, U., Chaudhuri, B.B.: Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 444–457 (2008)
Das, N., et al.: A statistical-topological feature combination for recognition of handwritten numerals. Appl. Soft Comput. 12(8), 2486–2495 (2012)
Basu, S., Chaudhuri, C., Kundu, M., Nasipuri, M., Basu, D.K.: Text line extraction from multi-skewed handwritten documents. Pattern Recogn. 40(6), 1825–1839 (2007)
Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., et al.: An improved feature descriptor for recognition of handwritten bangla alphabet. arXiv preprint arXiv:1501.05497 (2015)
Agrawal, M., Bhaskarabhatla, A.S., Madhvanath, S.: Data collection for handwriting corpus creation in indic scripts. In: International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA 2004), New Delhi, India November 2004 (2004). Citeseer
Agnihotri, V.P.: Offline handwritten devanagari script recognition. IJ Inf. Technol. Comput. Sci. 8(1), 37–42 (2012)
Alaei, A., Nagabhushan, P., Pal, U.: A benchmark Kannada handwritten document dataset and its segmentation. In: 2011 International Conference on Document Analysis and Recognition, pp. 141–145 (2011). IEEE
Kumar, M., Sharma, R.K., Jindal, M.K., Jindal, S.R., Singh, H.: Benchmark datasets for offline handwritten Gurmukhi script recognition. In: Sundaram, S., Harit, G. (eds.) DAR 2018. CCIS, vol. 1020, pp. 143–151. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-9361-7_13
Punjabi Language. https://simple.wikipedia.org/wiki/Punjabi_language Accessed 17 May 2022
Gurmukhi. https://en.wikipedia.org/wiki/Gurmukhi. Accessed 31-05-2022
Aggarwal, A., Singh, K.: Handwritten Gurmukhi character recognition. In: 2015 International Conference on Computer, Communication and Control (IC4), pp. 1–5. IEEE (2015)
Sinha, G., Rani, R., Dhir, R.: Handwritten Gurmukhi character recognition using K-NN and SVM classifier. Int. J. Adv. Res. Comput. Sci. Soft. Eng. 2(6), 288–293 (2012)
Siddharth, K.S., Jangid, M., Dhir, R., Rani, R.: Handwritten Gurmukhi character recognition using statistical and background directional distribution. Int. J. Comput. Sci. Eng. (IJCSE) 3(06), 2332–2345 (2011)
Kumar, M., Jindal, M., Sharma, R.: Offline handwritten Gurmukhi character recognition: analytical study of different transformations. Proc. Natl. Acad. Sci. India Sect. A 87(1), 137–143 (2017). https://doi.org/10.1007/s40010-016-0284-y
Singh, S., Aggarwal, A., Dhir, R.: Use of gabor filters for recognition of handwritten gurmukhi character. Int. J. Adv. Res. Comput. Sci. Soft. Eng. 2(5) (2012)
Kumar, M., Sharma, R., Jindal, M.: Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(4), 381–391 (2014). https://doi.org/10.1007/s40009-014-0253-4
Kumar, M., Jindal, M., Sharma, R., Jindal, S.R.: Offline handwritten numeral recognition using combination of different feature extraction techniques. Natl. Acad. Sci. Lett. 41(1), 29–33 (2018). https://doi.org/10.1007/s40009-017-0606-x
Singh, P., Budhiraja, S.: Offline handwritten Gurmukhi numeral recognition using wavelet transforms. Int. J. Mod. Educ. Comput. Sci. 4(8), 34 (2012)
Kaur, H., Rani, S.: Handwritten Gurumukhi character recognition using convolution neural network. Int. J. Comput. Intell. Res. 13(5), 933–943 (2017)
Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 20(2) (2021)
Bloice, M.D.: Augmentor. https://augmentor.readthedocs.io/en/master/userguide/mainfeatures.html. Accessed 20 May 2022
Kumar, N., Gupta, S., Pradesh, H.: A novel handwritten gurmukhi character recognition system based on deep neural networks. Int. J. Pure Appl. Math. 117(21), 663–678 (2017)
Jain, U., Sharma, D.: Recognition of isolated handwritten characters of Gurumukhi script using neocognitron. Int. J. Comput. Appl. 10(8) (2010)
Garg, A., Jindal, M.K., Singh, A.: Offline handwritten Gurmukhi character recognition: K-NN vs. SVM classifier. Int. J. Inf. Technol. 13(6), 2389–2396 (2021). https://doi.org/10.1007/s41870-019-00398-4
Sarangi, P.K., Sahoo, A.K., Kaur, G., Nayak, S.R., Bhoi, A.K.: Gurmukhi numerals recognition using ann. In: Cognitive Informatics and Soft Computing, pp. 377–386. Springer (2022). https://doi.org/10.1007/978-981-16-8763-1_30
Sarangi, P.K., Sahoo, A.K., Nayak, S.R., Agarwal, A., Sethy, A.: Recognition of isolated handwritten Gurumukhi numerals using hopfield neural network. In: Das, A.K., Nayak, J., Naik, B., Dutta, S., Pelusi, D. (eds.) Computational Intelligence in Pattern Recognition. AISC, vol. 1349, pp. 597–605. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2543-5_51
Acknowledgments
The authors would like to acknowledge the time and efforts made by all the writers who have filled the samples towards the development of the dataset described in the present article.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kaur, K., Chaudhuri, B.B., Lehal, G.S. (2022). A Benchmark Gurmukhi Handwritten Character Dataset: Acquisition, Compilation, and Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-21648-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21647-3
Online ISBN: 978-3-031-21648-0
eBook Packages: Computer ScienceComputer Science (R0)