skip to main content
research-article

Classification of Printed Gujarati Characters Using Low-Level Stroke Features

Published:12 April 2016Publication History
Skip Abstract Section

Abstract

This article presents an elegant technique for extracting the low-level stroke features, such as endpoints, junction points, line elements, and curve elements, from offline printed text using a template matching approach. The proposed features are used to classify a subset of characters from Gujarati script. The database consists of approximately 16,782 samples of 42 middle-zone symbols from the Gujarati character set collected from three different sources: machine printed books, newspapers, and laser printed documents. The purpose of this division is to add variety in terms of size, font type, style, ink variation, and boundary deformation. The experiments are performed on the database using a k-nearest neighbor (kNN) classifier and results are compared with other widely used structural features, namely Chain Codes (CC), Directional Element Features (DEF), and Histogram of Oriented Gradients (HoG). The results show that the features are quite robust against the variations and give comparable performance with other existing works.

References

  1. Government of India. 2003. THE CONSTITUTION (NINETY-SECOND AMENDMENT) ACT, 2003. Retrieved December 5, 2014 from http://lawmin.nic.in/coi/EIGHTH-SCHEDULE.pdf.Google ScholarGoogle Scholar
  2. Government of India. 2001. Abstract of Speakers. Strength of Languages and Mother Tongues-2000. Census of India. (2001).Google ScholarGoogle Scholar
  3. Sameer Antani and L. Agnihotri. 1999. Gujarati character recognition. In Proceedings of the 5th International Conference on Document Analysisand Recognition (ICDAR’99). 418--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Aparna and A. G. Ramakrishnan. 2002. A complete Tamil optical character recognition system. In Document Analysis Systems V, Daniel Lopresti, Jianying Hu, and Ramanujan Kashi (Eds.). Springer Berlin, 53--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Belongie, J. Malik, and J. Puzicha. 2002. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 4 (2002), 509--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. L. Brown. 1994. The fringe distance measure: An easily calculated image distance measure with recognition results comparable to gaussian blurring. IEEE Transactions on Systems, Man, and Cybernetics 24, 1 (1994), 111--115.Google ScholarGoogle ScholarCross RefCross Ref
  7. B. B. Chaudhuri. 2007. Digital Document Processing: Major Directions and Recent Advances. Springer-Verlag London Ltd. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. B. Chaudhuri and U. Pal. 1998. A complete printed Bangla OCR system. Pattern Recognition 31, 5 (1998), 531--549.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. B. Chaudhuri, U. Pal, and M. Mitra. 2001. Automatic recognition of printed Oriya script. In Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR’01). 795--799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. M. Cover and P. E. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1, 21--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Navneet Dalal and Triggs Bill. 2005. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jignesh Dholakia, A. Negi, and S. Ram Mohan. 2009. Progress in Gujarati document processing and character recognition. In Guide to OCR for Indic Scripts: Document Recognition and Retrieval, Venu Govindaraju and Srirangaraj Setlur (Eds.). Springer, 73--95.Google ScholarGoogle Scholar
  13. J. Dholakia, A. Yajnik, and A. Negi. 2007. Wavelet feature based confusion character sets for Gujarati script. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications. 366--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. O. Duda, P. E. Hart, and D. G. Stork. 2000. Pattern Classification (2 ed.). Wiley-Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Ejima, Y. Nakamura, and M. Kimura. 1985. The characteristic feature based on four types of structural information and their effectiveness for character recognition. Transactions of the IEICE J68-D (1985), 789--796.Google ScholarGoogle Scholar
  16. Herbert Freeman. 1974. Computer processing of line-drawing images. ACM Computing Surveys (CSUR) 6, 1 (1974), 57--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hiromichi Fujisawa. 2008. Forty years of research in character and document recognition an industrial perspective. Pattern Recognition 41, 8 (Aug. 2008), 2435--2446. DOI:http://dx.doi.org/10.1016/j.patcog.2008.03.015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. M. Goswami, H. B. Prajapati, and V. K. Dabhi. 2011. Classification of printed Gujarati characters using som based k-nearest neighbor classifier. In Proceedings of the International Conference on Image Information Processing. 1--5.Google ScholarGoogle Scholar
  19. V. Govindaraju. 1999. Chaincode contour processing for handwritten word recognition. IEEE Transactions on Pattern Analysis and Machince Intelligence 21, 9 (1999), 928--932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Govindaraju and S. Setlur. 2009. Guide to OCR for Indic Scripts: Document Recognition and Retrieval. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Hassan, S, Chaudhury, and M. Gopal. 2014. Feature combination for binary pattern classification. International Journal of Document Analysis and Recognition (IJDAR) (2014), 1--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. S. Huang, G. Y. Yang, and G. Y. Tang. 1979. A fast two-dimensional median filtering algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 27 (1979), 13--18.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. J. Hull. 1998. Document Analysis System. Vol. II. World Scientific, Chapter Document image skew detection: Survey and annotated bibliography, 40--64.Google ScholarGoogle Scholar
  24. M. Ishtiaq. 1999. Language Shifts Among the Scheduled Tribes in India: A Geographical Study. Vol. 13. Motilal Banarsidass Publ.Google ScholarGoogle Scholar
  25. C. V. Jawahar, Pavan Kumar, and S. S. Ravi Kiran. 2003. A bilingual OCR for Hindi-Telugu documents and its applications. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03). 408--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Kaneko and M. Okudaira. 1985. Encoding of arbitrary curves based on the chain code representation. IEEE Transactions on Communications 33, 7 (Jul 1985), 697--707.Google ScholarGoogle ScholarCross RefCross Ref
  27. N. Kato, M. Suzuki, S. I. Omachi, H. Aso, and Y. Nemoto. 1999. Handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999), 258--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Murphy Kevin. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA/London. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Keys. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 29 (1981), 1153--1160.Google ScholarGoogle ScholarCross RefCross Ref
  30. Suryaprakash Kompalli, Sankalp Nayak, Srirangaraj Setlur, and Venu Govindaraju. 2005. Challenges in OCR of Devanagari documents. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. V. Lakshmi and C. Patvardhan. 2002. A multi-font OCR system for printed Telugu text. In Proceedings of the Langauge Engineering Conference. 7--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gurpreet Singh Lehal and Chandan Singh. 2000. A Gurmukhi script recognition system. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR’00). 557--560.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. Mandar, M. Gitam, M. Suman, and G. Mukesh. 2012. Similar looking Gujarati printed character recognition using locality preserving projection and artificial neural network. In Proceedings of the 3rd International Conference on Emerging Applications of Information Technology (EAIT’12). IEEE, 457--461.Google ScholarGoogle Scholar
  34. V. N. Manjunath, P. S. Aradhyal, G. Hemantha Kumar, and S. Noushathl. 2006. Fisher linear discriminant analysis based technique useful for efficient character recognition. In Proceedings of the 4th International Conference on Intelligent Sensing and Information Processing. 49--52.Google ScholarGoogle Scholar
  35. S. Y. Mehta and J. Dholakia. 2004. Gujarati Script. Technical Report. 6--33 pages.Google ScholarGoogle Scholar
  36. U. Pal and B. B. Chaudhuri. 2004. Indian script character recognition: A survey. Pattern Recognition 37, 9 (sep 2004), 1887--1899. DOI:http://dx.doi.org/10.1016/j.patcog.2004.02.003Google ScholarGoogle ScholarCross RefCross Ref
  37. A. Sharma and S. K. Shah. 2006. Design and implementation of optical character recognition system to recognize Gujarati script using template matching. IE(I) Journal(ET) 86, 1 (2006), 44--49.Google ScholarGoogle Scholar
  38. I. Siddiqi and N. Vincent. 2009. A set of chain code based features for writer recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09). 981--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. M. K. Sinha and H. N. Mahabala. 1979. Machine recognition of Devanagari script. IEEE Transactions on Systems, Man, and Cybernetics 9, 8 (1979), 435--441.Google ScholarGoogle ScholarCross RefCross Ref
  40. G. Srikantan, S. Lam, and S. Srihari. 1996. Gradient-based contour encoding for character recognition. Pattern Recognition 29, 7 (July 1996), 1147--1160.Google ScholarGoogle ScholarCross RefCross Ref
  41. N. Sun, M. Abe, and Y. Nemoto. 1995. A handwritten character recognition system by using improved directional element feature and subspace method. Transactions of the IEICE J78-D-II (1995), 922--930.Google ScholarGoogle Scholar
  42. S. B. Suthar, M. M. Goswami, and A. R. Thakkar. 2014. Empirical study of thinning algorithms on printed Gujarati characters and handwritten numerals. In Proceedings of the 2nd International Conference on Emerging Research in Computing, Information, Communication, and Applications (ERCICA’14), Nitte Meenakshi (Ed.), Vol. 2. ELSEVIER, 104--110.Google ScholarGoogle Scholar
  43. O. D. Trier, A. K. Jain, and T. Taxt. 1996. Feature extraction methods for character recognition: A survey. Pattern Recognition 29, 4 (1996), 641--662.Google ScholarGoogle ScholarCross RefCross Ref
  44. Dengsheng Zhang and Guojun Lu. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1 (Jan. 2004), 1--19. DOI:http://dx.doi.org/10.1016/j.patcog.2003.07.008Google ScholarGoogle ScholarCross RefCross Ref
  45. Y. Y. Zhang and P. S. P. Wang. 1996. A parallel thinning algorithm with two-subitration that generates one-pixel-wide skeletons. In Proceedings of the 13th International Conference on Pattern Recognition (ICPR’96). IEEE, 457--461. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Classification of Printed Gujarati Characters Using Low-Level Stroke Features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 15, Issue 4
      June 2016
      173 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/2915955
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 April 2016
      • Accepted: 1 November 2015
      • Revised: 1 October 2015
      • Received: 1 November 2014
      Published in tallip Volume 15, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader