Skip to main content
Log in

Offline handwritten Gurumukhi word recognition using eXtreme Gradient Boosting methodology

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Handwritten word recognition is undoubtedly a challenging task due to various writing styles of individuals. So, lots of efforts are put to recognize handwritten words using efficient classifiers based on extracted features that rely on the visual appearance of the handwritten text. Due to numerous real-time applications, handwritten word recognition is an important research area which is seeking a lot of attention from researchers for the last 10 years. In this article, the authors have proposed a holistic approach and eXtreme Gradient Boosting (XGBoost) technique to recognize offline handwritten Gurumukhi words. In this direction, four state-of-the-art features like zoning, diagonal, intersection & open-end points and peak extent features have been considered to extract discriminant features from the handwritten word digital images. The proposed approach is evaluated on a public benchmark dataset of Gurumukhi script that comprises 40,000 samples of handwritten words. Based on extracted features, the words are classified into one of the 100 classes based on XGBoost technique. Effectiveness of the system is assessed based on several evaluation parameters like CPU elapsed time, accuracy, precision, recall, F1-score and area under curve (AUC). XGBoost technique attained the best results of accuracy (91.66%), recall (91.66%), precision (91.39%), F1-score (91.14%) and AUC (95.66%) using zoning features based on 90% data as the training set and remaining 10% data as the testing set. The comparison of the proposed approach with the existing approaches has also been done which reveals the significance of the XGBoost technique comparatively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Arani SAAA, Kabir E, Ebrahimpour R (2019) Handwritten Farsi word recognition using NN-based fusion of HMM classifiers with different types of features. Int J Image Gr 19(1):1–21

    Google Scholar 

  • Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–233

    Article  Google Scholar 

  • Assayony MO, Mahmoud SA (2017) Integration of gabor features with bag-of-features framework for Arabic handwritten word recognition. In: Proceedings of the 9th IEEE-GCC conference and exhibition (GCCCE), pp 1–4

  • Bhowmik S, Malakar S, Sarkar R and Nasipuri M (2014) Handwritten Bangla word recognition using elliptical features. In: Proceedings of international conference on computational intelligence and communication networks (CICN), pp 257–261

  • Bhunia AK, Roy PP, Mohta A, Pal U (2018) Cross-language framework for word recognition and spotting of Indic Scripts. Pattern Recogn 79:12–31

    Article  Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794

  • Das S, Singh PK, Bhowmik S, Sarkar R, Nasipuri M (2016) A harmony search based wrapper feature selection method for holistic Bangla word recognition. Procedia Comput Sci 89:395–403

    Article  Google Scholar 

  • El-Yacoubi A, Gilloux M, Sabourin R, Suen CY (1999) Unconstrained handwritten word recognition using hidden markov models. IEEE Trans Pattern Anal Mach Intell 21(8):752–760

    Article  Google Scholar 

  • Gader PD, Mohamed MA, Keller JM (1996) Fusion of handwritten word classifiers. Pattern Recognit Lett 17:577–584

    Article  Google Scholar 

  • Ghosh M, Malakar S, Bhowmik S, Sarkar R, Nasipuri M (2019) Feature selection for handwritten word recognition using memetic algorithm. In: Mandal J, Dutta P, Mukhopadhyay S (eds) Advances in intelligent computing. Studies in computational intelligence, vol 687, pp 103–124

  • Gupta JD, Samanta S, Chanda B (2018) Ensemble classifier-based off-line handwritten word recognition system in holistic approach. IET Image Proc 12(8):1467–1474

    Article  Google Scholar 

  • James J, Lakshmi C, Kiran U, Parthiban A (2019) An efficient offline handwritten character recognition using CNN and XGBoost. Int J Innov Technol Explor Eng (IJITEE) 8(6):115–118

    Google Scholar 

  • Katiyar G, Katiyar A, Mehfukz S (2017) Off-line handwritten character recognition system using support vector machine. Am J Neural Netw Appl 3(2):22–28

    Google Scholar 

  • Kaur H, Kumar M (2018) A comprehensive survey on word recognition for non-Indic and Indic scripts. Pattern Anal Appl 21(4):897–929

    Article  MathSciNet  Google Scholar 

  • Kaur H, Kumar M (2019) Benchmark dataset: offline handwritten Gurmukhi city names for postal automation. In: Sundaram S, Harit G (eds) Document analysis and recognition. DAR 2018. Communications in computer and information science, vol 1020, pp 152–159

  • Kessentini Y, Paquet T, Hamadou AMB (2010) Off-line handwritten word recognition using multi-stream hidden Markov models. Pattern Recogn Lett 31(1):60–70

    Article  Google Scholar 

  • Koerich AL, Sabourin R, Suen CY (2003) Large vocabulary off-line handwriting recognition: a survey. Pattern Anal Appl 6(2):97–121

    Article  MathSciNet  Google Scholar 

  • Kumar M, Sharma RK, Jindal MK (2013) A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J Res 59(6):687–692

    Article  Google Scholar 

  • Kumar M, Jindal MK, Sharma RK (2014a) A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37:567–572

    Article  Google Scholar 

  • Kumar M, Jindal MK, Sharma RK (2014b) Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37(4):381–391

    Article  Google Scholar 

  • Kumar M, Jindal MK, Sharma RK, Jindal SR (2016) Offline handwritten pre-segmented character recognition of Gurmukhi script. Mach Gr Vis 25(1):45–55

    Google Scholar 

  • Kumar M, Jindal MK, Sharma RK (2017) Offline handwritten Gurmukhi character recognition: analytical study of different transformations. Proc Natl Acad Sci India Sect A Phys Sci 87:137–143

    Article  Google Scholar 

  • Kumar M, Sharma RK, Jindal MK, Jindal SR, Singh H (2019) Benchmark datasets for offline handwritten Gurmukhi script recognition. In: Sundaram S, Harit G (eds) Document analysis and recognition. DAR 2018. Communications in computer and information science, vol 1020, pp 143–151

  • Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: Proceedings of 13th international conference on document analysis and recognition (ICDAR), pp 1021–1025

  • Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32:2533–2552

    Article  Google Scholar 

  • Niu XX, Suen CY (2011) A novel hybrid CNN–SVM classifier for recognizing handwritten digits. Pattern Recogn 45(4):1318–1325

    Article  Google Scholar 

  • Patel MS, Reddy SL, Naik AJ (2015) An efficient way of handwritten english word recognition. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications (FICTA), pp 563–571

  • Plamondon R, Srihari SN (2000) On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):68–89

    Article  Google Scholar 

  • Ramraj S, Uzir N, Sunil R, Banerjee S (2016) Experimenting XGBoost algorithm for prediction and classification of different datasets. Int J Control Theory Appl 9:651–662

    Google Scholar 

  • Senior AW, Robinson AJ (1998) An off-line cursive handwriting recognition system. IEEE Trans Pattern Anal Mach Intell 20(3):309–321

    Article  Google Scholar 

  • Song R, Chen S, Deng B, Li L (2016) eXtreme gradient boosting for identifying individual users cross different digital devices. Springer, Berlin, pp 43–54

    Google Scholar 

  • Steinherz T, Rivlin E, Intrator N (1999) Offline cursive script word recognition—a survey. Int J Doc Anal Recogn 2:90–110

    Article  Google Scholar 

  • Tavoli R, Keyvanpour M, Mozaffari S (2018) Statistical geometric components of straight lines (SGCSL) feature extraction method for offline Arabic/Persian handwritten words recognition. IET Image Proc 12:1606–1616

    Article  Google Scholar 

  • Vinciarelli A, Bengio S, Bunke H (2004) Offline recognition of unconstrained handwriting texts using HMMs and statistical models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720

    Article  Google Scholar 

  • Weldegebriel HT, Liu H, Haq AU, Bugingo E, Zhang D (2020) A new hybrid convolutional neural network and eXtreme gradient boosting classifier for recognizing handwritten ethiopian characters. IEEE Access 8:17804–17818

    Article  Google Scholar 

  • Wu X, Chen Q, You J, Xiao Y (2019) Unconstrained offline handwritten word recognition by position embedding integrated resNets model. IEEE Signal Process Lett 26(4):597–601

    Article  Google Scholar 

  • Younis KS (2017) Arabic handwritten character recognition based on deep convolutional neural networks. Jordan J Comput Inf Technol (JJCIT) 3(3):186–200

    Google Scholar 

  • Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27(3):236–239

    Article  Google Scholar 

  • Zhong Z, Jin L, Xie Z (2015) High performance offline handwritten chinese character recognition using GoogLeNet and directional feature maps. In: Proceedings of international conference on document analysis and recognition (ICDAR), pp 846–850

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, H., Kumar, M. Offline handwritten Gurumukhi word recognition using eXtreme Gradient Boosting methodology. Soft Comput 25, 4451–4464 (2021). https://doi.org/10.1007/s00500-020-05455-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05455-w

Keywords

Navigation