ABSTRACT
We present a framework for classification of text document images based on their script. We deal with the domain of Indian scripts which has high inter script similarities. Indian scripts have characteristic curvature distributions which help in visual discrimination of scripts. We use edge direction based features to capture the distribution of curvature. We also use a recently proposed feature selection algorithm to obtain the most discriminating curvature features. We form hierarchy (automatically) based on statistical distances between the script models. Hierarchy allows us to group similar scripts at one level and then focus on the classification between the similar scripts at the next level leading to improvement in accuracy. We show experiments and results on a large set of about 3400 images.
- A. V. Anil, A. Jain, and H. J. Zhang. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921--1935, 1998.Google ScholarCross Ref
- W. Chan and G. G. Coghill. Text analysis using local energy. Pattern Recognition, 34(12):2523--2532, 2001.Google ScholarDigital Library
- S. Chaudhury and R. Seth. Trainable script identification strategies for Indian languages. ICDAR, pages 657--660, 1999. Google ScholarDigital Library
- J. Hochberg, L. Kerns, P. Kelly, and T. Thomas. Automatic script identification from images using cluster-based templates. TPAMI, 19(2):176--181, 1997. Google ScholarDigital Library
- G. D. Joshi, S. Garg, and J. Sivaswamy. Script identification from indian documents. DAS, pages 255--267, 2006. Google ScholarDigital Library
- U. Pal, S. Sinha, and B. B. Chaudhuri. Multi-script line identification from Indian document. ICDAR, 2:880--884, 2003. Google ScholarDigital Library
- G. Sfikas, C. Constantinopoulos, A. Likas, and N. Galatsanos. An analytic distance metric for gaussian mixture models with application in image retrieval. ICANN, LNCS 3697, pages 835--840, 2005. Google ScholarDigital Library
- A. Spitz. Determination of the script and language content of document images. TPAMI, 19(3):235--245, 1997. Google ScholarDigital Library
- T. N. Tan. Rotation invariant texture features and their use in automatic script identification. TPAMI, 20(7): 751--756, 1998. Google ScholarDigital Library
- M. Vasconcelos and N. Vasconcelos. Natural image statistics and low-complexity feature selection. PAMI, 31(2):228--244, 2009. Google ScholarDigital Library
- S. L. Wood, X. Yao, K. Krishnamurthi, and L. Dang. Language identification for printed text independent of segmentation. Intl. Conf. Image Processing, 3:428--431, 1995. Google ScholarDigital Library
Index Terms
- Curvature feature distribution based classification of Indian scripts from document images
Recommendations
Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques
Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, ...
Handwritten Numeral Recognition of Six Popular Indian Scripts
ICDAR '07: Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten ...
Handwritten Numeral Databases of Indian Scripts and Multistage Recognition of Mixed Numerals
This article primarily concerns the problem of isolated handwritten numeral recognition of major Indian scripts. The principal contributions presented here are (a) pioneering development of two databases for handwritten numerals of two most popular ...
Comments