Skip to main content

A Benchmark Gurmukhi Handwritten Character Dataset: Acquisition, Compilation, and Recognition

  • Conference paper
  • First Online:
Frontiers in Handwriting Recognition (ICFHR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13639))

Included in the following conference series:

  • 1033 Accesses

Abstract

Gurmukhi script is used to write the official ‘Punjabi’ language of the people of the western part of Indian Punjab. The script is having approximately 160 million native speakers. Recognition of handwritten characters in the Gurmukhi script is still in its embryonic stage due to intricate character shapes and the scarcity of standard datasets. This paper introduces a new large-scale benchmark dataset “Gurmukhi_HWdb1.0” which is an important development in the handwritten character recognition of this script. This dataset has a total of 137,700 handwritten samples of 41 basic Gurmukhi characters and 10 numeral classes. Out of these, 110,160 images are used for training,13,770 images are set aside for validation, and 13,770 images are used for testing. Here, 265 individuals have contributed to the development of the dataset. Recognition of the script is carried out using a CNN architecture based on transfer learning on the VGG16 network. We fine-tuned the model and added our own fully connected layers needed for Gurmukhi characters. The proposed model is executed on this collected “Gurmukhi_HWdb1.0” dataset for evaluation. A detailed comparison with different batch sizes is performed to understand the functionality of the model. Experimental results show that the proposed model can be benchmarked against the concerned dataset with a test accuracy of 98.42% for Gurmukhi characters and 97.51% for Gurmukhi numerals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sharma, R., Kaushik, B.: Offline recognition of handwritten indic scripts: a state-of-the-art survey and future perspectives. Comput. Sci. Rev. 38, 100302 (2020)

    Google Scholar 

  2. Memon, J., Sami, M., Khan, R.A., Uddin, M.: Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access 8, 142642–142668 (2020)

    Article  Google Scholar 

  3. Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 11(1), 1–35 (2012)

    Article  Google Scholar 

  4. Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: Casia online and offline chinese handwriting databases. In: 2011 International Conference on Document Analysis and Recognition, pp. 37–41 (2011). IEEE

    Google Scholar 

  5. Su, T., Zhang, T., Guan, D.: Corpus-based hit-mw database for offline recognition of general-purpose Chinese handwritten text. IJDAR 10(1), 27–38 (2007). https://doi.org/10.1007/s10032-006-0037-6

    Article  Google Scholar 

  6. Marti, U.-V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071

    Article  MATH  Google Scholar 

  7. Grother, P.J.: NIST special database 19. NIST handprinted forms and characters database (2017)

    Google Scholar 

  8. Lawgali, A., Angelova, M., Bouridane, A.: HACDB: handwritten arabic characters database for automatic character recognition. In: European Workshop on Visual Information Processing (EUVIP), pp. 255–259 (2013). IEEE

    Google Scholar 

  9. Mozaffari, S., Faez, K., Faradji, F., Ziaratban, M., Golzan, S.M.: A comprehensive isolated Farsi/Arabic character database for handwritten OCR research. In: Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft (2006)

    Google Scholar 

  10. KIM, D.-H., Hwang, Y.-S., Park, S.-T., Kim, E.-J., Paek, S.-H., BANG, S.-Y.: Handwritten korean character image database pe92. IEICE Trans. Inf. Syst. 79(7), 943–950 (1996)

    Google Scholar 

  11. Bhattacharya, U., Chaudhuri, B.: Databases for research on recognition of handwritten characters of Indian scripts. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 789–793. IEEE (2005)

    Google Scholar 

  12. Bhattacharya, U., Chaudhuri, B.B.: Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 444–457 (2008)

    Article  Google Scholar 

  13. Das, N., et al.: A statistical-topological feature combination for recognition of handwritten numerals. Appl. Soft Comput. 12(8), 2486–2495 (2012)

    Article  Google Scholar 

  14. Basu, S., Chaudhuri, C., Kundu, M., Nasipuri, M., Basu, D.K.: Text line extraction from multi-skewed handwritten documents. Pattern Recogn. 40(6), 1825–1839 (2007)

    Article  MATH  Google Scholar 

  15. Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., et al.: An improved feature descriptor for recognition of handwritten bangla alphabet. arXiv preprint arXiv:1501.05497 (2015)

  16. Agrawal, M., Bhaskarabhatla, A.S., Madhvanath, S.: Data collection for handwriting corpus creation in indic scripts. In: International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA 2004), New Delhi, India November 2004 (2004). Citeseer

    Google Scholar 

  17. Agnihotri, V.P.: Offline handwritten devanagari script recognition. IJ Inf. Technol. Comput. Sci. 8(1), 37–42 (2012)

    Google Scholar 

  18. Alaei, A., Nagabhushan, P., Pal, U.: A benchmark Kannada handwritten document dataset and its segmentation. In: 2011 International Conference on Document Analysis and Recognition, pp. 141–145 (2011). IEEE

    Google Scholar 

  19. Kumar, M., Sharma, R.K., Jindal, M.K., Jindal, S.R., Singh, H.: Benchmark datasets for offline handwritten Gurmukhi script recognition. In: Sundaram, S., Harit, G. (eds.) DAR 2018. CCIS, vol. 1020, pp. 143–151. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-9361-7_13

    Chapter  Google Scholar 

  20. Punjabi Language. https://simple.wikipedia.org/wiki/Punjabi_language Accessed 17 May 2022

  21. Gurmukhi. https://en.wikipedia.org/wiki/Gurmukhi. Accessed 31-05-2022

  22. Aggarwal, A., Singh, K.: Handwritten Gurmukhi character recognition. In: 2015 International Conference on Computer, Communication and Control (IC4), pp. 1–5. IEEE (2015)

    Google Scholar 

  23. Sinha, G., Rani, R., Dhir, R.: Handwritten Gurmukhi character recognition using K-NN and SVM classifier. Int. J. Adv. Res. Comput. Sci. Soft. Eng. 2(6), 288–293 (2012)

    Google Scholar 

  24. Siddharth, K.S., Jangid, M., Dhir, R., Rani, R.: Handwritten Gurmukhi character recognition using statistical and background directional distribution. Int. J. Comput. Sci. Eng. (IJCSE) 3(06), 2332–2345 (2011)

    Google Scholar 

  25. Kumar, M., Jindal, M., Sharma, R.: Offline handwritten Gurmukhi character recognition: analytical study of different transformations. Proc. Natl. Acad. Sci. India Sect. A 87(1), 137–143 (2017). https://doi.org/10.1007/s40010-016-0284-y

    Article  Google Scholar 

  26. Singh, S., Aggarwal, A., Dhir, R.: Use of gabor filters for recognition of handwritten gurmukhi character. Int. J. Adv. Res. Comput. Sci. Soft. Eng. 2(5) (2012)

    Google Scholar 

  27. Kumar, M., Sharma, R., Jindal, M.: Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(4), 381–391 (2014). https://doi.org/10.1007/s40009-014-0253-4

    Article  Google Scholar 

  28. Kumar, M., Jindal, M., Sharma, R., Jindal, S.R.: Offline handwritten numeral recognition using combination of different feature extraction techniques. Natl. Acad. Sci. Lett. 41(1), 29–33 (2018). https://doi.org/10.1007/s40009-017-0606-x

    Article  Google Scholar 

  29. Singh, P., Budhiraja, S.: Offline handwritten Gurmukhi numeral recognition using wavelet transforms. Int. J. Mod. Educ. Comput. Sci. 4(8), 34 (2012)

    Article  Google Scholar 

  30. Kaur, H., Rani, S.: Handwritten Gurumukhi character recognition using convolution neural network. Int. J. Comput. Intell. Res. 13(5), 933–943 (2017)

    Google Scholar 

  31. Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 20(2) (2021)

    Google Scholar 

  32. Bloice, M.D.: Augmentor. https://augmentor.readthedocs.io/en/master/userguide/mainfeatures.html. Accessed 20 May 2022

  33. Kumar, N., Gupta, S., Pradesh, H.: A novel handwritten gurmukhi character recognition system based on deep neural networks. Int. J. Pure Appl. Math. 117(21), 663–678 (2017)

    Google Scholar 

  34. Jain, U., Sharma, D.: Recognition of isolated handwritten characters of Gurumukhi script using neocognitron. Int. J. Comput. Appl. 10(8) (2010)

    Google Scholar 

  35. Garg, A., Jindal, M.K., Singh, A.: Offline handwritten Gurmukhi character recognition: K-NN vs. SVM classifier. Int. J. Inf. Technol. 13(6), 2389–2396 (2021). https://doi.org/10.1007/s41870-019-00398-4

    Article  Google Scholar 

  36. Sarangi, P.K., Sahoo, A.K., Kaur, G., Nayak, S.R., Bhoi, A.K.: Gurmukhi numerals recognition using ann. In: Cognitive Informatics and Soft Computing, pp. 377–386. Springer (2022). https://doi.org/10.1007/978-981-16-8763-1_30

  37. Sarangi, P.K., Sahoo, A.K., Nayak, S.R., Agarwal, A., Sethy, A.: Recognition of isolated handwritten Gurumukhi numerals using hopfield neural network. In: Das, A.K., Nayak, J., Naik, B., Dutta, S., Pelusi, D. (eds.) Computational Intelligence in Pattern Recognition. AISC, vol. 1349, pp. 597–605. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2543-5_51

    Chapter  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the time and efforts made by all the writers who have filled the samples towards the development of the dataset described in the present article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kanwaljit Kaur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaur, K., Chaudhuri, B.B., Lehal, G.S. (2022). A Benchmark Gurmukhi Handwritten Character Dataset: Acquisition, Compilation, and Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21648-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21647-3

  • Online ISBN: 978-3-031-21648-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics