research-article

Classification of Printed Gujarati Characters Using Low-Level Stroke Features

Authors:

Mukesh M. Goswami,

Suman K. MitraAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 15, Issue 4

Article No.: 25, Pages 1 - 26

https://doi.org/10.1145/2856105

Published: 12 April 2016 Publication History

Abstract

This article presents an elegant technique for extracting the low-level stroke features, such as endpoints, junction points, line elements, and curve elements, from offline printed text using a template matching approach. The proposed features are used to classify a subset of characters from Gujarati script. The database consists of approximately 16,782 samples of 42 middle-zone symbols from the Gujarati character set collected from three different sources: machine printed books, newspapers, and laser printed documents. The purpose of this division is to add variety in terms of size, font type, style, ink variation, and boundary deformation. The experiments are performed on the database using a k-nearest neighbor (kNN) classifier and results are compared with other widely used structural features, namely Chain Codes (CC), Directional Element Features (DEF), and Histogram of Oriented Gradients (HoG). The results show that the features are quite robust against the variations and give comparable performance with other existing works.

References

[1]

Government of India. 2003. THE CONSTITUTION (NINETY-SECOND AMENDMENT) ACT, 2003. Retrieved December 5, 2014 from http://lawmin.nic.in/coi/EIGHTH-SCHEDULE.pdf.

[2]

Government of India. 2001. Abstract of Speakers. Strength of Languages and Mother Tongues-2000. Census of India. (2001).

[3]

Sameer Antani and L. Agnihotri. 1999. Gujarati character recognition. In Proceedings of the 5th International Conference on Document Analysisand Recognition (ICDAR’99). 418--421.

Digital Library

[4]

K. Aparna and A. G. Ramakrishnan. 2002. A complete Tamil optical character recognition system. In Document Analysis Systems V, Daniel Lopresti, Jianying Hu, and Ramanujan Kashi (Eds.). Springer Berlin, 53--57.

Digital Library

[5]

S. Belongie, J. Malik, and J. Puzicha. 2002. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 4 (2002), 509--522.

Digital Library

[6]

R. L. Brown. 1994. The fringe distance measure: An easily calculated image distance measure with recognition results comparable to gaussian blurring. IEEE Transactions on Systems, Man, and Cybernetics 24, 1 (1994), 111--115.

[7]

B. B. Chaudhuri. 2007. Digital Document Processing: Major Directions and Recent Advances. Springer-Verlag London Ltd.

Digital Library

[8]

B. B. Chaudhuri and U. Pal. 1998. A complete printed Bangla OCR system. Pattern Recognition 31, 5 (1998), 531--549.

[9]

B. B. Chaudhuri, U. Pal, and M. Mitra. 2001. Automatic recognition of printed Oriya script. In Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR’01). 795--799.

Digital Library

[10]

T. M. Cover and P. E. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1, 21--27.

Digital Library

[11]

Navneet Dalal and Triggs Bill. 2005. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE.

Digital Library

[12]

Jignesh Dholakia, A. Negi, and S. Ram Mohan. 2009. Progress in Gujarati document processing and character recognition. In Guide to OCR for Indic Scripts: Document Recognition and Retrieval, Venu Govindaraju and Srirangaraj Setlur (Eds.). Springer, 73--95.

[13]

J. Dholakia, A. Yajnik, and A. Negi. 2007. Wavelet feature based confusion character sets for Gujarati script. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications. 366--370.

Digital Library

[14]

R. O. Duda, P. E. Hart, and D. G. Stork. 2000. Pattern Classification (2 ed.). Wiley-Interscience.

Digital Library

[15]

T. Ejima, Y. Nakamura, and M. Kimura. 1985. The characteristic feature based on four types of structural information and their effectiveness for character recognition. Transactions of the IEICE J68-D (1985), 789--796.

[16]

Herbert Freeman. 1974. Computer processing of line-drawing images. ACM Computing Surveys (CSUR) 6, 1 (1974), 57--97.

Digital Library

[17]

Hiromichi Fujisawa. 2008. Forty years of research in character and document recognition an industrial perspective. Pattern Recognition 41, 8 (Aug. 2008), 2435--2446.

Digital Library

[18]

M. M. Goswami, H. B. Prajapati, and V. K. Dabhi. 2011. Classification of printed Gujarati characters using som based k-nearest neighbor classifier. In Proceedings of the International Conference on Image Information Processing. 1--5.

[19]

V. Govindaraju. 1999. Chaincode contour processing for handwritten word recognition. IEEE Transactions on Pattern Analysis and Machince Intelligence 21, 9 (1999), 928--932.

Digital Library

[20]

V. Govindaraju and S. Setlur. 2009. Guide to OCR for Indic Scripts: Document Recognition and Retrieval. Springer.

Digital Library

[21]

E. Hassan, S, Chaudhury, and M. Gopal. 2014. Feature combination for binary pattern classification. International Journal of Document Analysis and Recognition (IJDAR) (2014), 1--18.

Digital Library

[22]

T. S. Huang, G. Y. Yang, and G. Y. Tang. 1979. A fast two-dimensional median filtering algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 27 (1979), 13--18.

[23]

J. J. Hull. 1998. Document Analysis System. Vol. II. World Scientific, Chapter Document image skew detection: Survey and annotated bibliography, 40--64.

[24]

M. Ishtiaq. 1999. Language Shifts Among the Scheduled Tribes in India: A Geographical Study. Vol. 13. Motilal Banarsidass Publ.

[25]

C. V. Jawahar, Pavan Kumar, and S. S. Ravi Kiran. 2003. A bilingual OCR for Hindi-Telugu documents and its applications. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03). 408--412.

Digital Library

[26]

T. Kaneko and M. Okudaira. 1985. Encoding of arbitrary curves based on the chain code representation. IEEE Transactions on Communications 33, 7 (Jul 1985), 697--707.

[27]

N. Kato, M. Suzuki, S. I. Omachi, H. Aso, and Y. Nemoto. 1999. Handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999), 258--262.

Digital Library

[28]

Murphy Kevin. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA/London.

Digital Library

[29]

R. Keys. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 29 (1981), 1153--1160.

[30]

Suryaprakash Kompalli, Sankalp Nayak, Srirangaraj Setlur, and Venu Govindaraju. 2005. Challenges in OCR of Devanagari documents. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, 1--5.

Digital Library

[31]

C. V. Lakshmi and C. Patvardhan. 2002. A multi-font OCR system for printed Telugu text. In Proceedings of the Langauge Engineering Conference. 7--17.

Digital Library

[32]

Gurpreet Singh Lehal and Chandan Singh. 2000. A Gurmukhi script recognition system. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR’00). 557--560.

[33]

C. Mandar, M. Gitam, M. Suman, and G. Mukesh. 2012. Similar looking Gujarati printed character recognition using locality preserving projection and artificial neural network. In Proceedings of the 3rd International Conference on Emerging Applications of Information Technology (EAIT’12). IEEE, 457--461.

[34]

V. N. Manjunath, P. S. Aradhyal, G. Hemantha Kumar, and S. Noushathl. 2006. Fisher linear discriminant analysis based technique useful for efficient character recognition. In Proceedings of the 4th International Conference on Intelligent Sensing and Information Processing. 49--52.

[35]

S. Y. Mehta and J. Dholakia. 2004. Gujarati Script. Technical Report. 6--33 pages.

[36]

U. Pal and B. B. Chaudhuri. 2004. Indian script character recognition: A survey. Pattern Recognition 37, 9 (sep 2004), 1887--1899.

[37]

A. Sharma and S. K. Shah. 2006. Design and implementation of optical character recognition system to recognize Gujarati script using template matching. IE(I) Journal(ET) 86, 1 (2006), 44--49.

[38]

I. Siddiqi and N. Vincent. 2009. A set of chain code based features for writer recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09). 981--985.

Digital Library

[39]

R. M. K. Sinha and H. N. Mahabala. 1979. Machine recognition of Devanagari script. IEEE Transactions on Systems, Man, and Cybernetics 9, 8 (1979), 435--441.

[40]

G. Srikantan, S. Lam, and S. Srihari. 1996. Gradient-based contour encoding for character recognition. Pattern Recognition 29, 7 (July 1996), 1147--1160.

[41]

N. Sun, M. Abe, and Y. Nemoto. 1995. A handwritten character recognition system by using improved directional element feature and subspace method. Transactions of the IEICE J78-D-II (1995), 922--930.

[42]

S. B. Suthar, M. M. Goswami, and A. R. Thakkar. 2014. Empirical study of thinning algorithms on printed Gujarati characters and handwritten numerals. In Proceedings of the 2nd International Conference on Emerging Research in Computing, Information, Communication, and Applications (ERCICA’14), Nitte Meenakshi (Ed.), Vol. 2. ELSEVIER, 104--110.

[43]

O. D. Trier, A. K. Jain, and T. Taxt. 1996. Feature extraction methods for character recognition: A survey. Pattern Recognition 29, 4 (1996), 641--662.

[44]

Dengsheng Zhang and Guojun Lu. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1 (Jan. 2004), 1--19.

[45]

Y. Y. Zhang and P. S. P. Wang. 1996. A parallel thinning algorithm with two-subitration that generates one-pixel-wide skeletons. In Proceedings of the 13th International Conference on Pattern Recognition (ICPR’96). IEEE, 457--461.

Digital Library

Cited By

Kumar JRoy A(2023)Handwritten Text Recognition for Regional Languages of Indian SubcontinentProceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-19-7041-2_19(241-258)Online publication date: 15-Apr-2023
https://doi.org/10.1007/978-981-19-7041-2_19
Vanani APatel VLimbachiya KSharma A(2022)Handwritten Gujarati Numeral Recognition using Deep Learning2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT)10.1109/CISCT55310.2022.10046543(1-4)Online publication date: 23-Dec-2022
https://doi.org/10.1109/CISCT55310.2022.10046543
Soni K(2022)A Review on Optical Character Recognition of Gujarati ScriptsProceedings of the 6th International Conference on Advance Computing and Intelligent Engineering10.1007/978-981-19-2225-1_28(311-319)Online publication date: 22-Sep-2022
https://doi.org/10.1007/978-981-19-2225-1_28
Show More Cited By

Index Terms

Classification of Printed Gujarati Characters Using Low-Level Stroke Features
1. Applied computing
  1. Document management and text processing
    1. Document capture

Recommendations

On-line handwritten Gujarati character Recognition using low level stroke
ICIIP '15: Proceedings of the 2015 Third International Conference on Image Information Processing (ICIIP)

This paper presents a low level stroke feature based method for recognition of online handwritten Gujarati characters and numerals. A reasonable size database of online handwritten Gujarati characters and numerals has been developed. This is the first ...
Stroke effect on legibility of Japanese characters

This study applied a computer program to analyze the descriptors of Japanese characters, including 56 Hiragana, 56 Katakana, and 98 Kanji characters. An experiment was designed to test the legibility of these characters by 40 Japanese students studying ...
Zone Identification in the Printed Gujarati Text
ICDAR '05: Proceedings of the Eighth International Conference on Document Analysis and Recognition

Gujarati, is a language from the Indo-Aryan family of languages, used by 50 million people in the western part of India. Gujarati - script used to write the Gujarati language, is a multilevel script, written in three zones: base character zone, upper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 15, Issue 4

June 2016

173 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/2915955

Editor:
Richard Sproat
Google, Inc., USA

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2016

Accepted: 01 November 2015

Revised: 01 October 2015

Received: 01 November 2014

Published in TALLIP Volume 15, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kumar JRoy A(2023)Handwritten Text Recognition for Regional Languages of Indian SubcontinentProceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-19-7041-2_19(241-258)Online publication date: 15-Apr-2023
https://doi.org/10.1007/978-981-19-7041-2_19
Vanani APatel VLimbachiya KSharma A(2022)Handwritten Gujarati Numeral Recognition using Deep Learning2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT)10.1109/CISCT55310.2022.10046543(1-4)Online publication date: 23-Dec-2022
https://doi.org/10.1109/CISCT55310.2022.10046543
Soni K(2022)A Review on Optical Character Recognition of Gujarati ScriptsProceedings of the 6th International Conference on Advance Computing and Intelligent Engineering10.1007/978-981-19-2225-1_28(311-319)Online publication date: 22-Sep-2022
https://doi.org/10.1007/978-981-19-2225-1_28
Bhatt PNasriwala JSavant R(2022)Template-Based Thinning Method for Handwritten Gujarati Character’s Strokes and its Classification for Writer-Dependent Gujarati Font SynthesisAdvanced Machine Intelligence and Signal Processing10.1007/978-981-19-0840-8_15(203-216)Online publication date: 26-Jun-2022
https://doi.org/10.1007/978-981-19-0840-8_15
Sharma AThakkar PAdhyaru DZaveri T(2019)Handwritten Gujarati Character Recognition Using Structural Decomposition TechniquePattern Recognition and Image Analysis10.1134/S105466181901006129:2(325-338)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1134/S1054661819010061
Kathiriya KGoswami M(2019)Gujarati Text Recognition: A Review2019 Innovations in Power and Advanced Computing Technologies (i-PACT)10.1109/i-PACT44901.2019.8960022(1-5)Online publication date: Mar-2019
https://doi.org/10.1109/i-PACT44901.2019.8960022
Naik VDesai A(2018)Multi-layer Classification Approach for Online Handwritten Gujarati Character RecognitionComputational Intelligence: Theories, Applications and Future Directions - Volume II10.1007/978-981-13-1135-2_45(595-606)Online publication date: 2-Sep-2018
https://doi.org/10.1007/978-981-13-1135-2_45
Goswami MMitra S(2018)Printed Gujarati Character Classification Using High-Level StrokesProceedings of 2nd International Conference on Computer Vision & Image Processing10.1007/978-981-10-7898-9_16(197-209)Online publication date: 5-May-2018
https://doi.org/10.1007/978-981-10-7898-9_16

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents