skip to main content
research-article

Bagging: An Ensemble Approach for Recognition of Handwritten Place Names in Gurumukhi Script

Published: 25 July 2023 Publication History

Abstract

In this article, the authors present an effort to recognize handwritten Gurumukhi place names for use in postal automation. Five feature extraction techniques (zoning, horizontal peak extent, vertical peak extent, diagonal, and centroid) have been analyzed and optimized using Principal Component Analysis (PCA). Four classification methods (k-Nearest Neighbor (k-NN), decision tree, random forest, and Convolutional Neural Network (CNN)) have been utilized to classify the handwritten word images. To enhance the recognition results, the authors have employed Bootstrap Aggregation (Bagging) with a majority voting scheme. The authors used a public benchmark dataset of 40,000 handwritten place-name samples in the Punjabi language for their experimental work. The experiments were conducted using a 70:30 partitioning approach, where 70% of the data was utilized for training and the remaining 30% for testing. The system achieved a maximum recognition accuracy of 96.98% by utilizing a combination of zoning, vertical peak extent, and diagonal features, and a minimum Mean Squared Error (MSE) of 0.86% based on a combination of zoning and horizontal peak extent features with a majority voting scheme through ensemble (Bagging) methodology.

References

[1]
F. Kimura, Y. Miyake, and M. Sridhar. 1995. Handwritten ZIP code recognition using Lexicon free word recognition algorithm. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR’95). 906–910.
[2]
R. Plamondon and S. N. Srihari. 2000. On-line and off-line handwritten recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 22, 1 (2000), 62–84.
[3]
L. Liu, M. Koga, and H. Fujisawa. 2002. Lexicon driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 24, 11 (2002), 1425–1437.
[4]
U. Pal, K. Roy, and F. Kimura. 2006. A Lexicon driven method for unconstrained Bangla handwritten word recognition. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR’2006). 601–606.
[5]
Y. Wen, Y. Lu, and P. Shi. 2007. Handwritten Bangla numeral recognition system and its application to postal automation. Pattern Recognition 40 (2007), 99–107.
[6]
K. Roy. 2008. On the Development of an Optical Character Recognition System for Indian Postal Automation. Ph.D. thesis, Jadavpur University, Kolkata.
[7]
U. Pal, K. Roy, and F. Kimura. 2009. A Lexicon-driven handwritten city-name recognition scheme for Indian postal automation. IEICE Transactions on Information and Systems E92.D, 5 (2009), 1146–1158.
[8]
U. Pal, R. K. Roy, and F. Kimura. 2011. Handwritten street name recognition for Indian postal automation. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’11). 483–487.
[9]
U. Pal, R. K. Roy, and F. Kimura. 2012. Multi-lingual city name Recognition for Indian postal automation. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 169–173.
[10]
R. Rani, R. Dhir, and G. S. Lehal. 2013. Modified Gabor feature extraction method for word level script identification-experimentation with Gurmukhi and English scripts. International Journal of Signal Processing, Image Processing and Pattern Recognition 6, 5 (2013), 25–38.
[11]
S. Thadchanamoorthy, N. D. Kodikara, H. L. Premaretne, U. Pal, and F. Kimura. 2013. Tamil handwritten city name database development and recognition for postal automation. In Proceedings of the 12th International Conference on Document Analysis and Recognition. 793–797.
[12]
J. Dasgupta, K. Bhattacharya, and B. Chanda. 2016. A holistic approach for off-line handwritten cursive word recognition using directional feature based on Arnold transform. Pattern Recognition Letters 79 (2016), 73–79.
[13]
P. P. Roy, A. K. Bhunia, A. Das, P. Dey, and U. Pal. 2016. HMM-based Indic handwritten word recognition using zone segmentation. Computer Vision and Pattern Recognition 60 (2016), 1057–1075.
[14]
A. K. Bhunia, P. P. Roy, A. Mohta, and U. Pal. 2018. Cross-language framework for word recognition and spotting of Indic scripts. Pattern Recognition 79 (2018), 12–31.
[15]
R. Tavoli, M. Keyvanpour, and S. Mozaffari. 2018. Statistical geometric components of straight lines (SGCSL) feature extraction method for offline Arabic/Persian handwritten words recognition. IET Image Processing 12 (2018), 1606–1616.
[16]
R. F. Moghaddam, M. Cheriet, M. M. Adankon, K. Filonenko, and R. Wisnovsky. 2010. IBN SINA: A database for research on processing and understanding of Arabic manuscripts images. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. 11–18.
[17]
M. Pechwitz, S. S. Maddouri, V. Märgner, N. Ellouze, and H. Amiri. 2002. IFN/ENIT - database of handwritten Arabic words. In Proceedings of CIFED. 127–136.
[18]
M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar. 2001. Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognition 34, 5 (2001), 1057–1065.
[19]
J. Dasgupta, S. Samanta, and B. Chanda. 2018. Ensemble classifier-based off-line handwritten word recognition system in holistic approach. IET Image Processing 12, 8 (2018), 1467–1474.
[20]
H. Kaur and M. Kumar. 2018. A comprehensive survey on word recognition for non-Indic and Indic scripts. Pattern Analysis and Applications 21, 4 (2018), 897–929.
[21]
S. Bhowmik, S. Malakar, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri. 2019. Off-line Bangla handwritten word recognition: a holistic approach. Neural Computing and Applications 31 (2019), 5783–5798.
[23]
S. Bhowmik, S. Malakar, R. Sarkar, and M. Nasipuri. 2014. Handwritten Bangla word recognition using elliptical features. In Proceedings of the International Conference on Computational Intelligence and Communication Networks (CICN’14). 257–261.
[24]
S. Malakar, P. Sharma, P. K. Singh, M. Das, R. Sarkar, and M. Nasipuri. 2017. A holistic approach for handwritten Hindi word recognition. International Journal of Computer Vision and Image Processing 7, 1 (2017), 59–78.
[25]
S. Barua, S. Malakar, S. Bhowmik, R. Sarkar, and M. Nasipuri. 2017. Bangla handwritten city name recognition using gradient-based feature. In Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications. 343–352.
[26]
S. Sen, A. Bhattacharyya, M. Mitra, K. Roy, S. K. Naskar, and R. Sarkar. 2020. Online Bangla handwritten word recognition using HMM and language model. Neural Computing and Applications 32 (2020), 9939–9951.
[27]
S. Malakar, S. Paul, S. Kundu, S. Bhowmik, R. Sarkar, and M. Nasipuri. 2020. Handwritten word recognition using lottery ticket hypothesis based pruned CNN model: A new benchmark on CMATERdb2.1.2. Neural Computing and Applications 32 (2020), 15209–15220.
[28]
J. Frankle and M. Carbin. 2019. The lottery ticket hypothesis: finding sparse, trainable neural networks. In Proceedings of the International Conference on Learning Representations. 1–42.
[29]
T. Y. Zhang and C. Y. Suen. 1984. A fast parallel algorithm for thinning digital patterns. Communications of the ACM 27, 3 (1984), 236–239.
[30]
M. Kumar, R. K. Sharma, and M. K. Jindal. 2013. A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE Journal of Research 59, 6 (2013), 687–692.
[31]
M. Kumar, R. K. Sharma, and M. K. Jindal. 2014. Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. National Academy Science Letters 37, 4 (2014), 381–391.
[32]
M. Kumar, S. R. Jindal, M. K. Jindal, and G. S. Lehal. 2019. Improved recognition results of medieval handwritten Gurmukhi manuscripts using Boosting and Bagging methodologies. Neural Processing Letters 50 (2019), 43–56.
[33]
Y. LeCun and Y. Bengio. 1990. Handwritten digit recognition with a back-propagation network. In Proceedings of the Advances in Neural Information Processing Systems. 396–404.
[34]
H. Kaur and M. Kumar. 2019. Benchmark dataset: Offline handwritten Gurmukhi city names for postal automation. In Document Analysis and Recognition (DAR 2018). S. Sundaram and G. Harit (Eds). Communications in Computer and Information Science, Vol. 1020. Springer, Singapore, 152–159.
[35]
S. Gunter and H. Bunke. 2004. Optimization of weights in a multiple classifier handwritten word recognition system using a genetic algorithm. Electronic Letters on Computer Vision and Image Analysis 3, 1 (2004), 25–41.
[36]
Y. Chherawala, P. P. Roy, and M. Cheriet. 2013. Feature design for offline Arabic handwriting recognition: Handcrafted vs automated. In Proceedings of the 12th International Conference on Document Analysis and Recognition. 290–294.
[37]
U.-V. Marti and H. Bunke. 2001. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. International Journal of Pattern Recognition and Artificial Intelligence 15, 01 (2001), 65–90.
[38]
T. Rath and R. Manmatha. 2003. Features for word spotting in historical manuscripts. In Proceedings of the 7th International Conference on Document Analysis and Recognition (DAS’03) 1 (2003), 218–222.
[39]
S. Roy, P. P. Roy, P. Shivakumara, and U. Pal. 2013. Word recognition in natural scene and video images using Hidden Markov Model. In Proceedings of the 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG’13). 1–4.
[40]
S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young. 2003. Robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003). 682–687.
[41]
P. P. Roy, P. Dey, S. Roy, U Pal, and F. Kimura. 2014. A novel approach of Bangla handwritten text recognition using HMM. In Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. 661–666.
[42]
M. Verma, N. Sood, P. P. Roy, and B. Raman. 2017. Script identification in natural scene images: A dataset and texture-feature based performance evaluation. In Proceedings of the International Conference on Computer Vision and Image Processing. 309–319.
[43]
H. Kaur and M. Kumar. 2020. Offline handwritten Gurumukhi word recognition using eXtreme gradient boosting methodology. Soft Computing 25 (2020), 4451–4464.
[44]
H. Kaur and M. Kumar. 2021. On the recognition of offline handwritten word using holistic approach and adaboost methodology. Multimedia Tools and Applications 80 (2021), 11155–11175.
[45]
S. V. Rajashekararadhya and P. V. Ranjan. 2008. Efficient zone based feature extraction algorithm for handwritten numeral recognition of four popular south Indian scripts. Journal of Theoretical and Applied Information Technology 4, 12 (2008), 1171–1181.
[46]
D. Impedovo and G. Pirlo. 2014. Zoning methods for handwritten character recognition: A survey. Pattern Recognition 47 (2014), 969–981.
[47]
E. Hussain, A. Hannan, and K. Kashyap. 2015. A zoning based feature extraction method for recognition of handwritten Assamese characters. International Journal of Computer Science and Telecommunications 6, 2 (2015), 226–228.
[48]
H. W. Herwanto, A. N. Handayani, K. L. Chandrika, and A. P. Wibawa. 2019. Zoning feature extraction for handwritten Javanese character recognition. In Proceedings of the International Conference on Electrical, Electronics and Information Engineering (ICEEIE’19). 264–268.
[49]
O. P. Jena, S. K. Pradhan, and P. K. Biswal. 2019. Odia characters and numerals recognition using Hopfield neural network based on zoning feature. International Journal of Recent Technology and Engineering 8, 2 (2019), 4928–4937.
[50]
N. Level Otsu. 1979. A threshold selection method from gray-level histogram. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (1979), 62–66.
[51]
B. Sareen, R. Ahuja, and A. Singh. 2021. A benchmark Gurmukhi offline handwritten dataset of tehsil and sub tehsil names of Punjab. SPAST Abstracts 1, 01 (2021).
[52]
S. Sharma, S. Gupta, D. Gupta, S. Juneja, G. Singal, G. Dhiman, and S. Kautish. 2022a. Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Scientific Programming (2022a),
[53]
S. Sharma, S. Gupta, N. Kumar, and H. Chugh. 2022b. Analysis of the proposed CNN model for the recognition of Gurmukhi handwritten city names of Punjab. Mobile Radio Communications and 5G Networks: Proceedings of Second MRCN 2021. Springer, 267–279.
[54]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 56 (2014), 1929−1958.

Cited By

View all
  • (2024)Enhancing automated vehicle identification by integrating YOLO v8 and OCR techniques for high-precision license plate detection and recognitionScientific Reports10.1038/s41598-024-65272-114:1Online publication date: 22-Jun-2024
  • (2023)Performance Analysis of Vision Transformer Based Architecture for Cursive Handwritten Text Recognition2023 26th International Conference on Computer and Information Technology (ICCIT)10.1109/ICCIT60459.2023.10441402(1-6)Online publication date: 13-Dec-2023
  • (2023)A pragmatic ensemble learning approach for rainfall predictionDiscover Internet of Things10.1007/s43926-023-00044-33:1Online publication date: 9-Oct-2023

Index Terms

  1. Bagging: An Ensemble Approach for Recognition of Handwritten Place Names in Gurumukhi Script

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 7
    July 2023
    422 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3610376
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2023
    Online AM: 20 April 2023
    Accepted: 13 April 2023
    Revised: 09 April 2023
    Received: 22 September 2022
    Published in TALLIP Volume 22, Issue 7

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Postal automation
    2. Gurumukhi words
    3. place names
    4. feature extraction
    5. feature selection
    6. classification
    7. Bagging

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)42
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing automated vehicle identification by integrating YOLO v8 and OCR techniques for high-precision license plate detection and recognitionScientific Reports10.1038/s41598-024-65272-114:1Online publication date: 22-Jun-2024
    • (2023)Performance Analysis of Vision Transformer Based Architecture for Cursive Handwritten Text Recognition2023 26th International Conference on Computer and Information Technology (ICCIT)10.1109/ICCIT60459.2023.10441402(1-6)Online publication date: 13-Dec-2023
    • (2023)A pragmatic ensemble learning approach for rainfall predictionDiscover Internet of Things10.1007/s43926-023-00044-33:1Online publication date: 9-Oct-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media