Abstract
Handwritten word recognition is undoubtedly a challenging task due to various writing styles of individuals. So, lots of efforts are put to recognize handwritten words using efficient classifiers based on extracted features that rely on the visual appearance of the handwritten text. Due to numerous real-time applications, handwritten word recognition is an important research area which is seeking a lot of attention from researchers for the last 10 years. In this article, the authors have proposed a holistic approach and eXtreme Gradient Boosting (XGBoost) technique to recognize offline handwritten Gurumukhi words. In this direction, four state-of-the-art features like zoning, diagonal, intersection & open-end points and peak extent features have been considered to extract discriminant features from the handwritten word digital images. The proposed approach is evaluated on a public benchmark dataset of Gurumukhi script that comprises 40,000 samples of handwritten words. Based on extracted features, the words are classified into one of the 100 classes based on XGBoost technique. Effectiveness of the system is assessed based on several evaluation parameters like CPU elapsed time, accuracy, precision, recall, F1-score and area under curve (AUC). XGBoost technique attained the best results of accuracy (91.66%), recall (91.66%), precision (91.39%), F1-score (91.14%) and AUC (95.66%) using zoning features based on 90% data as the training set and remaining 10% data as the testing set. The comparison of the proposed approach with the existing approaches has also been done which reveals the significance of the XGBoost technique comparatively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arani SAAA, Kabir E, Ebrahimpour R (2019) Handwritten Farsi word recognition using NN-based fusion of HMM classifiers with different types of features. Int J Image Gr 19(1):1–21
Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–233
Assayony MO, Mahmoud SA (2017) Integration of gabor features with bag-of-features framework for Arabic handwritten word recognition. In: Proceedings of the 9th IEEE-GCC conference and exhibition (GCCCE), pp 1–4
Bhowmik S, Malakar S, Sarkar R and Nasipuri M (2014) Handwritten Bangla word recognition using elliptical features. In: Proceedings of international conference on computational intelligence and communication networks (CICN), pp 257–261
Bhunia AK, Roy PP, Mohta A, Pal U (2018) Cross-language framework for word recognition and spotting of Indic Scripts. Pattern Recogn 79:12–31
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Das S, Singh PK, Bhowmik S, Sarkar R, Nasipuri M (2016) A harmony search based wrapper feature selection method for holistic Bangla word recognition. Procedia Comput Sci 89:395–403
El-Yacoubi A, Gilloux M, Sabourin R, Suen CY (1999) Unconstrained handwritten word recognition using hidden markov models. IEEE Trans Pattern Anal Mach Intell 21(8):752–760
Gader PD, Mohamed MA, Keller JM (1996) Fusion of handwritten word classifiers. Pattern Recognit Lett 17:577–584
Ghosh M, Malakar S, Bhowmik S, Sarkar R, Nasipuri M (2019) Feature selection for handwritten word recognition using memetic algorithm. In: Mandal J, Dutta P, Mukhopadhyay S (eds) Advances in intelligent computing. Studies in computational intelligence, vol 687, pp 103–124
Gupta JD, Samanta S, Chanda B (2018) Ensemble classifier-based off-line handwritten word recognition system in holistic approach. IET Image Proc 12(8):1467–1474
James J, Lakshmi C, Kiran U, Parthiban A (2019) An efficient offline handwritten character recognition using CNN and XGBoost. Int J Innov Technol Explor Eng (IJITEE) 8(6):115–118
Katiyar G, Katiyar A, Mehfukz S (2017) Off-line handwritten character recognition system using support vector machine. Am J Neural Netw Appl 3(2):22–28
Kaur H, Kumar M (2018) A comprehensive survey on word recognition for non-Indic and Indic scripts. Pattern Anal Appl 21(4):897–929
Kaur H, Kumar M (2019) Benchmark dataset: offline handwritten Gurmukhi city names for postal automation. In: Sundaram S, Harit G (eds) Document analysis and recognition. DAR 2018. Communications in computer and information science, vol 1020, pp 152–159
Kessentini Y, Paquet T, Hamadou AMB (2010) Off-line handwritten word recognition using multi-stream hidden Markov models. Pattern Recogn Lett 31(1):60–70
Koerich AL, Sabourin R, Suen CY (2003) Large vocabulary off-line handwriting recognition: a survey. Pattern Anal Appl 6(2):97–121
Kumar M, Sharma RK, Jindal MK (2013) A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J Res 59(6):687–692
Kumar M, Jindal MK, Sharma RK (2014a) A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37:567–572
Kumar M, Jindal MK, Sharma RK (2014b) Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37(4):381–391
Kumar M, Jindal MK, Sharma RK, Jindal SR (2016) Offline handwritten pre-segmented character recognition of Gurmukhi script. Mach Gr Vis 25(1):45–55
Kumar M, Jindal MK, Sharma RK (2017) Offline handwritten Gurmukhi character recognition: analytical study of different transformations. Proc Natl Acad Sci India Sect A Phys Sci 87:137–143
Kumar M, Sharma RK, Jindal MK, Jindal SR, Singh H (2019) Benchmark datasets for offline handwritten Gurmukhi script recognition. In: Sundaram S, Harit G (eds) Document analysis and recognition. DAR 2018. Communications in computer and information science, vol 1020, pp 143–151
Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: Proceedings of 13th international conference on document analysis and recognition (ICDAR), pp 1021–1025
Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32:2533–2552
Niu XX, Suen CY (2011) A novel hybrid CNN–SVM classifier for recognizing handwritten digits. Pattern Recogn 45(4):1318–1325
Patel MS, Reddy SL, Naik AJ (2015) An efficient way of handwritten english word recognition. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications (FICTA), pp 563–571
Plamondon R, Srihari SN (2000) On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):68–89
Ramraj S, Uzir N, Sunil R, Banerjee S (2016) Experimenting XGBoost algorithm for prediction and classification of different datasets. Int J Control Theory Appl 9:651–662
Senior AW, Robinson AJ (1998) An off-line cursive handwriting recognition system. IEEE Trans Pattern Anal Mach Intell 20(3):309–321
Song R, Chen S, Deng B, Li L (2016) eXtreme gradient boosting for identifying individual users cross different digital devices. Springer, Berlin, pp 43–54
Steinherz T, Rivlin E, Intrator N (1999) Offline cursive script word recognition—a survey. Int J Doc Anal Recogn 2:90–110
Tavoli R, Keyvanpour M, Mozaffari S (2018) Statistical geometric components of straight lines (SGCSL) feature extraction method for offline Arabic/Persian handwritten words recognition. IET Image Proc 12:1606–1616
Vinciarelli A, Bengio S, Bunke H (2004) Offline recognition of unconstrained handwriting texts using HMMs and statistical models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720
Weldegebriel HT, Liu H, Haq AU, Bugingo E, Zhang D (2020) A new hybrid convolutional neural network and eXtreme gradient boosting classifier for recognizing handwritten ethiopian characters. IEEE Access 8:17804–17818
Wu X, Chen Q, You J, Xiao Y (2019) Unconstrained offline handwritten word recognition by position embedding integrated resNets model. IEEE Signal Process Lett 26(4):597–601
Younis KS (2017) Arabic handwritten character recognition based on deep convolutional neural networks. Jordan J Comput Inf Technol (JJCIT) 3(3):186–200
Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27(3):236–239
Zhong Z, Jin L, Xie Z (2015) High performance offline handwritten chinese character recognition using GoogLeNet and directional feature maps. In: Proceedings of international conference on document analysis and recognition (ICDAR), pp 846–850
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kaur, H., Kumar, M. Offline handwritten Gurumukhi word recognition using eXtreme Gradient Boosting methodology. Soft Comput 25, 4451–4464 (2021). https://doi.org/10.1007/s00500-020-05455-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05455-w