Abstract
Document scripts and document orientations are important information for the document digitalization. Prior work has been reported to identify document scripts and document orientations, whereas most reported methods are very sensitive to document skew and low image resolution. This paper reports a document script and document orientation identification method that addresses this issue by converting a document image into a pair of document vectors using the density and distribution of character strokes. Experiments over 3,024 document images of 12 scripts show that the proposed methods are accurate and tolerant to various types of document degradation.







References
Lu S, Tan CL (2006) Automatic document orientation detection and categorization through document vectorization. The 14th annual ACM international conference on multimedia, pp 113–116
Cao Y, Wang S, Li H (2005) Skew detection and correction in document images based on straight-line fitting. Pattern Recognit Lett 24(12):1871–1879
Caprari RS (2000) Algorithm for text page up/down orientation determination. Pattern Recognit Lett 21(4):311–317
Hrishikesh BA (2005) A generic method for determining up/down orientation of text in roman and non-roman scripts. Pattern Recognit 38(11):2114–2131
Ávila BT, Lins RD (2005) A fast orientation and skew detection algorithm for monochromatic document images. ACM symposium on document engineering, pp 118–126
Bloomberg D, Kopec G, Dasari L (1995) Measuring document image skew and orientation. SPIE 2422, pp 302–316
Akiyama T, Hagita N (1990) Automated entry system for printed documents. Pattern Recognit 23(11):1141–1154
Lu S, Tan CL (2008) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24
Busch A, Boles WW, Sridharan S (2005) Texture for script identification. IEEE Trans Pattern Anal Mach Intell 27(11):1720–1732
Jain AK, Zhong Y (1996) Page segmentation using texture analysis. Pattern Recognit 29(5):743–770
Tan TN (1998) Rotation invariant texture features and their use in automatic script identification. IEEE Trans Pattern Anal Mach Intell 20(7):751–756
Hochberg J, Kerns L, Kelly P, Thomas T (1997) Automatic script identification from images using cluster-based templates. IEEE Trans Pattern Anal Mach Intell 19(2):176–181
Spitz AL (1997) Determination of script and language content of document images. IEEE Trans Pattern Anal Mach Intell 19(3):235–245
Ding J, Lam L, Suen CY (1997) Classification of oriental and European scripts by using characteristic features. International conference on document analysis and recognition, pp 1023–1027
Zheng Y, Li H, Doermann D (2004) Machine printed text and handwriting identification in noisy document images. IEEE Trans Pattern Anal Mach Intell 26(3):337–353
Legendre P, Legendre L (1998) Numerical ecology. Elsevier Science, Amsterdam, pp 115–116
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, S., Li, L. & Tan, C.L. Identification of scripts and orientations of degraded document images. Pattern Anal Applic 13, 469–475 (2010). https://doi.org/10.1007/s10044-009-0169-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-009-0169-7