skip to main content
10.1145/1577802.1577806acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmocrConference Proceedingsconference-collections
research-article

Curvature feature distribution based classification of Indian scripts from document images

Published:25 July 2009Publication History

ABSTRACT

We present a framework for classification of text document images based on their script. We deal with the domain of Indian scripts which has high inter script similarities. Indian scripts have characteristic curvature distributions which help in visual discrimination of scripts. We use edge direction based features to capture the distribution of curvature. We also use a recently proposed feature selection algorithm to obtain the most discriminating curvature features. We form hierarchy (automatically) based on statistical distances between the script models. Hierarchy allows us to group similar scripts at one level and then focus on the classification between the similar scripts at the next level leading to improvement in accuracy. We show experiments and results on a large set of about 3400 images.

References

  1. A. V. Anil, A. Jain, and H. J. Zhang. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921--1935, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  2. W. Chan and G. G. Coghill. Text analysis using local energy. Pattern Recognition, 34(12):2523--2532, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chaudhury and R. Seth. Trainable script identification strategies for Indian languages. ICDAR, pages 657--660, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Hochberg, L. Kerns, P. Kelly, and T. Thomas. Automatic script identification from images using cluster-based templates. TPAMI, 19(2):176--181, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. D. Joshi, S. Garg, and J. Sivaswamy. Script identification from indian documents. DAS, pages 255--267, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. Pal, S. Sinha, and B. B. Chaudhuri. Multi-script line identification from Indian document. ICDAR, 2:880--884, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Sfikas, C. Constantinopoulos, A. Likas, and N. Galatsanos. An analytic distance metric for gaussian mixture models with application in image retrieval. ICANN, LNCS 3697, pages 835--840, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Spitz. Determination of the script and language content of document images. TPAMI, 19(3):235--245, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. N. Tan. Rotation invariant texture features and their use in automatic script identification. TPAMI, 20(7): 751--756, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Vasconcelos and N. Vasconcelos. Natural image statistics and low-complexity feature selection. PAMI, 31(2):228--244, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. L. Wood, X. Yao, K. Krishnamurthi, and L. Dang. Language identification for printed text independent of segmentation. Intl. Conf. Image Processing, 3:428--431, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Curvature feature distribution based classification of Indian scripts from document images

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        MOCR '09: Proceedings of the International Workshop on Multilingual OCR
        July 2009
        139 pages
        ISBN:9781605586984
        DOI:10.1145/1577802

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of34submissions,50%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader