skip to main content
research-article

Classification of Printed Gujarati Characters Using Low-Level Stroke Features

Published: 12 April 2016 Publication History

Abstract

This article presents an elegant technique for extracting the low-level stroke features, such as endpoints, junction points, line elements, and curve elements, from offline printed text using a template matching approach. The proposed features are used to classify a subset of characters from Gujarati script. The database consists of approximately 16,782 samples of 42 middle-zone symbols from the Gujarati character set collected from three different sources: machine printed books, newspapers, and laser printed documents. The purpose of this division is to add variety in terms of size, font type, style, ink variation, and boundary deformation. The experiments are performed on the database using a k-nearest neighbor (kNN) classifier and results are compared with other widely used structural features, namely Chain Codes (CC), Directional Element Features (DEF), and Histogram of Oriented Gradients (HoG). The results show that the features are quite robust against the variations and give comparable performance with other existing works.

References

[1]
Government of India. 2003. THE CONSTITUTION (NINETY-SECOND AMENDMENT) ACT, 2003. Retrieved December 5, 2014 from http://lawmin.nic.in/coi/EIGHTH-SCHEDULE.pdf.
[2]
Government of India. 2001. Abstract of Speakers. Strength of Languages and Mother Tongues-2000. Census of India. (2001).
[3]
Sameer Antani and L. Agnihotri. 1999. Gujarati character recognition. In Proceedings of the 5th International Conference on Document Analysisand Recognition (ICDAR’99). 418--421.
[4]
K. Aparna and A. G. Ramakrishnan. 2002. A complete Tamil optical character recognition system. In Document Analysis Systems V, Daniel Lopresti, Jianying Hu, and Ramanujan Kashi (Eds.). Springer Berlin, 53--57.
[5]
S. Belongie, J. Malik, and J. Puzicha. 2002. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 4 (2002), 509--522.
[6]
R. L. Brown. 1994. The fringe distance measure: An easily calculated image distance measure with recognition results comparable to gaussian blurring. IEEE Transactions on Systems, Man, and Cybernetics 24, 1 (1994), 111--115.
[7]
B. B. Chaudhuri. 2007. Digital Document Processing: Major Directions and Recent Advances. Springer-Verlag London Ltd.
[8]
B. B. Chaudhuri and U. Pal. 1998. A complete printed Bangla OCR system. Pattern Recognition 31, 5 (1998), 531--549.
[9]
B. B. Chaudhuri, U. Pal, and M. Mitra. 2001. Automatic recognition of printed Oriya script. In Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR’01). 795--799.
[10]
T. M. Cover and P. E. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1, 21--27.
[11]
Navneet Dalal and Triggs Bill. 2005. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE.
[12]
Jignesh Dholakia, A. Negi, and S. Ram Mohan. 2009. Progress in Gujarati document processing and character recognition. In Guide to OCR for Indic Scripts: Document Recognition and Retrieval, Venu Govindaraju and Srirangaraj Setlur (Eds.). Springer, 73--95.
[13]
J. Dholakia, A. Yajnik, and A. Negi. 2007. Wavelet feature based confusion character sets for Gujarati script. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications. 366--370.
[14]
R. O. Duda, P. E. Hart, and D. G. Stork. 2000. Pattern Classification (2 ed.). Wiley-Interscience.
[15]
T. Ejima, Y. Nakamura, and M. Kimura. 1985. The characteristic feature based on four types of structural information and their effectiveness for character recognition. Transactions of the IEICE J68-D (1985), 789--796.
[16]
Herbert Freeman. 1974. Computer processing of line-drawing images. ACM Computing Surveys (CSUR) 6, 1 (1974), 57--97.
[17]
Hiromichi Fujisawa. 2008. Forty years of research in character and document recognition an industrial perspective. Pattern Recognition 41, 8 (Aug. 2008), 2435--2446.
[18]
M. M. Goswami, H. B. Prajapati, and V. K. Dabhi. 2011. Classification of printed Gujarati characters using som based k-nearest neighbor classifier. In Proceedings of the International Conference on Image Information Processing. 1--5.
[19]
V. Govindaraju. 1999. Chaincode contour processing for handwritten word recognition. IEEE Transactions on Pattern Analysis and Machince Intelligence 21, 9 (1999), 928--932.
[20]
V. Govindaraju and S. Setlur. 2009. Guide to OCR for Indic Scripts: Document Recognition and Retrieval. Springer.
[21]
E. Hassan, S, Chaudhury, and M. Gopal. 2014. Feature combination for binary pattern classification. International Journal of Document Analysis and Recognition (IJDAR) (2014), 1--18.
[22]
T. S. Huang, G. Y. Yang, and G. Y. Tang. 1979. A fast two-dimensional median filtering algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 27 (1979), 13--18.
[23]
J. J. Hull. 1998. Document Analysis System. Vol. II. World Scientific, Chapter Document image skew detection: Survey and annotated bibliography, 40--64.
[24]
M. Ishtiaq. 1999. Language Shifts Among the Scheduled Tribes in India: A Geographical Study. Vol. 13. Motilal Banarsidass Publ.
[25]
C. V. Jawahar, Pavan Kumar, and S. S. Ravi Kiran. 2003. A bilingual OCR for Hindi-Telugu documents and its applications. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03). 408--412.
[26]
T. Kaneko and M. Okudaira. 1985. Encoding of arbitrary curves based on the chain code representation. IEEE Transactions on Communications 33, 7 (Jul 1985), 697--707.
[27]
N. Kato, M. Suzuki, S. I. Omachi, H. Aso, and Y. Nemoto. 1999. Handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999), 258--262.
[28]
Murphy Kevin. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA/London.
[29]
R. Keys. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 29 (1981), 1153--1160.
[30]
Suryaprakash Kompalli, Sankalp Nayak, Srirangaraj Setlur, and Venu Govindaraju. 2005. Challenges in OCR of Devanagari documents. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, 1--5.
[31]
C. V. Lakshmi and C. Patvardhan. 2002. A multi-font OCR system for printed Telugu text. In Proceedings of the Langauge Engineering Conference. 7--17.
[32]
Gurpreet Singh Lehal and Chandan Singh. 2000. A Gurmukhi script recognition system. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR’00). 557--560.
[33]
C. Mandar, M. Gitam, M. Suman, and G. Mukesh. 2012. Similar looking Gujarati printed character recognition using locality preserving projection and artificial neural network. In Proceedings of the 3rd International Conference on Emerging Applications of Information Technology (EAIT’12). IEEE, 457--461.
[34]
V. N. Manjunath, P. S. Aradhyal, G. Hemantha Kumar, and S. Noushathl. 2006. Fisher linear discriminant analysis based technique useful for efficient character recognition. In Proceedings of the 4th International Conference on Intelligent Sensing and Information Processing. 49--52.
[35]
S. Y. Mehta and J. Dholakia. 2004. Gujarati Script. Technical Report. 6--33 pages.
[36]
U. Pal and B. B. Chaudhuri. 2004. Indian script character recognition: A survey. Pattern Recognition 37, 9 (sep 2004), 1887--1899.
[37]
A. Sharma and S. K. Shah. 2006. Design and implementation of optical character recognition system to recognize Gujarati script using template matching. IE(I) Journal(ET) 86, 1 (2006), 44--49.
[38]
I. Siddiqi and N. Vincent. 2009. A set of chain code based features for writer recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09). 981--985.
[39]
R. M. K. Sinha and H. N. Mahabala. 1979. Machine recognition of Devanagari script. IEEE Transactions on Systems, Man, and Cybernetics 9, 8 (1979), 435--441.
[40]
G. Srikantan, S. Lam, and S. Srihari. 1996. Gradient-based contour encoding for character recognition. Pattern Recognition 29, 7 (July 1996), 1147--1160.
[41]
N. Sun, M. Abe, and Y. Nemoto. 1995. A handwritten character recognition system by using improved directional element feature and subspace method. Transactions of the IEICE J78-D-II (1995), 922--930.
[42]
S. B. Suthar, M. M. Goswami, and A. R. Thakkar. 2014. Empirical study of thinning algorithms on printed Gujarati characters and handwritten numerals. In Proceedings of the 2nd International Conference on Emerging Research in Computing, Information, Communication, and Applications (ERCICA’14), Nitte Meenakshi (Ed.), Vol. 2. ELSEVIER, 104--110.
[43]
O. D. Trier, A. K. Jain, and T. Taxt. 1996. Feature extraction methods for character recognition: A survey. Pattern Recognition 29, 4 (1996), 641--662.
[44]
Dengsheng Zhang and Guojun Lu. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1 (Jan. 2004), 1--19.
[45]
Y. Y. Zhang and P. S. P. Wang. 1996. A parallel thinning algorithm with two-subitration that generates one-pixel-wide skeletons. In Proceedings of the 13th International Conference on Pattern Recognition (ICPR’96). IEEE, 457--461.

Cited By

View all
  • (2023)Handwritten Text Recognition for Regional Languages of Indian SubcontinentProceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-19-7041-2_19(241-258)Online publication date: 15-Apr-2023
  • (2022)Handwritten Gujarati Numeral Recognition using Deep Learning2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT)10.1109/CISCT55310.2022.10046543(1-4)Online publication date: 23-Dec-2022
  • (2022)A Review on Optical Character Recognition of Gujarati ScriptsProceedings of the 6th International Conference on Advance Computing and Intelligent Engineering10.1007/978-981-19-2225-1_28(311-319)Online publication date: 22-Sep-2022
  • Show More Cited By

Index Terms

  1. Classification of Printed Gujarati Characters Using Low-Level Stroke Features

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 15, Issue 4
    June 2016
    173 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/2915955
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 April 2016
    Accepted: 01 November 2015
    Revised: 01 October 2015
    Received: 01 November 2014
    Published in TALLIP Volume 15, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Characters classification
    2. Gujarati characters
    3. stroke features

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Handwritten Text Recognition for Regional Languages of Indian SubcontinentProceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-19-7041-2_19(241-258)Online publication date: 15-Apr-2023
    • (2022)Handwritten Gujarati Numeral Recognition using Deep Learning2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT)10.1109/CISCT55310.2022.10046543(1-4)Online publication date: 23-Dec-2022
    • (2022)A Review on Optical Character Recognition of Gujarati ScriptsProceedings of the 6th International Conference on Advance Computing and Intelligent Engineering10.1007/978-981-19-2225-1_28(311-319)Online publication date: 22-Sep-2022
    • (2022)Template-Based Thinning Method for Handwritten Gujarati Character’s Strokes and its Classification for Writer-Dependent Gujarati Font SynthesisAdvanced Machine Intelligence and Signal Processing10.1007/978-981-19-0840-8_15(203-216)Online publication date: 26-Jun-2022
    • (2019)Handwritten Gujarati Character Recognition Using Structural Decomposition TechniquePattern Recognition and Image Analysis10.1134/S105466181901006129:2(325-338)Online publication date: 1-Apr-2019
    • (2019)Gujarati Text Recognition: A Review2019 Innovations in Power and Advanced Computing Technologies (i-PACT)10.1109/i-PACT44901.2019.8960022(1-5)Online publication date: Mar-2019
    • (2018)Multi-layer Classification Approach for Online Handwritten Gujarati Character RecognitionComputational Intelligence: Theories, Applications and Future Directions - Volume II10.1007/978-981-13-1135-2_45(595-606)Online publication date: 2-Sep-2018
    • (2018)Printed Gujarati Character Classification Using High-Level StrokesProceedings of 2nd International Conference on Computer Vision & Image Processing10.1007/978-981-10-7898-9_16(197-209)Online publication date: 5-May-2018

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media