ABSTRACT
Document skew estimation refers to the process of finding the angle of inclination made by the document with respect to horizontal axis. The skew introduced during the scanning process like this is inevitable, even slightest degree of skew will always be there irrespective of how the document is fed to the scanner: either manually or automatically. Hence, deskewing of the document is vital for achieving efficient results in downstream document analysis system (DAS) such as page layout analysis, optical character recognition (OCR), document retrieval etc. Although enormous amount of research has been conducted for document skew estimation, development of a solitary skew estimation approach that can handle all kinds of real time variation in documents is still an elusive goal for the research community. In this paper, we present a novel scheme for estimating document skew based on Wavelets. In the first stage, document images are moldered by the wavelet transform and efficient hough transform is used for estimating the skew of a document. Experimental results show that the method performs well on document images of complex layouts and to different scripts.
- A. Jensen and A. La Cour-Harbo. Ripples in Mathematics: The Discrete Wavelet Transform. Springer, international edition, 2001.Google Scholar
- Das A. K. and Chanda B. A fast algorithm for skew detection of document images using morphology. International Journal of Document Analysis and Recognition, 4:109--114, 2001.Google ScholarCross Ref
- Akiyama. T and Hagita. N. Automated entry system for printed documents. Pattern Recognition, 23(11):1141--1158, 1990. Google ScholarDigital Library
- Sauvola J. and Pietik Aainen M. Skew angle detection using texture direction analysis. pages 1099--1106, 1995.Google Scholar
- Avanindra and S. Chaudhuri. Robust Detection of Skew in Document Images. IEEE Transactions on Image Processing, 6:344--352, 1997. Google ScholarDigital Library
- Yuan B and Tan C. L. Convex hull based skew estimation. Pattern Recognition, 40:456--475, 2007. Google ScholarDigital Library
- Bagdanov. A and J. Kanai. Projection Profile based Skew Estimation Algorithm for JBIG Compressed Images. In Proceedings of 4th International Conference on Document Analysis and Recognition, pages 401--405, 1997. Google ScholarDigital Library
- Sun C and Si D. Skew and slant correction for document images using gradient direction. pages 142--146, 1997. Google ScholarDigital Library
- Chou C. H, Chu S. Y, and Chang F. Estimation of skew angles for scanned documents based on piecewise covering by parallelograms. Pattern Recognition, 40:443--455, 2007. Google ScholarDigital Library
- Chaudhuri. B. B and Pal. U. Skew Angle detection of Digitized Indian Script Documents. IEEE Transactions on PAMI, 19:182--186, 1997. Google ScholarDigital Library
- Nguyen D. T., Vo D. B., Nguyen T. M., and Nguyen T. G. A robust document skew estimation algorithm using mathematical morphology. pages 496--503, 2007. Google ScholarDigital Library
- Ciardiello G, Scafuro G, Degrandi M. T., Spada M. R, and Roccotelli M. P. An experimental system for ośce document handling and text recognition. pages 739--743, 1988.Google Scholar
- Gatos. B, Papamarkos. N, and Chamzas. C. Skew Detection and Text Line Position Determination in Digitized Documents. Pattern Recognition, 30:1505--1519, 1997.Google ScholarCross Ref
- Hashizume. A, Yeh. P. S, and Rasenfeld. A. A Method of detecting the orientation of aligned components. Pattern Recognition Letters, 4:125--132, 1986.Google ScholarDigital Library
- Hinds. S. C, Fisher. J. L, and Amato. D. P. A Document Skew Detection Method using Run-length Encoding and the Hough transform. In Proceedings of 10th International Conference on Pattern Recognition, pages 464--468, 1990.Google ScholarCross Ref
- Hou. H. S. Digital Document Processing. Wisley New York, 1983. Google ScholarDigital Library
- Baird H. S. The skew angle of printed documents. pages 21--24, 1987.Google Scholar
- Itay, Hagbi N, and Kedem K. Fast and accurate skew estimation based on distance transform. pages 402--407, 2008. Google ScholarDigital Library
- Najman L. Using mathematical morphology for document skew estimation. In SPIE Document Recognition and Retrieval IX, pages 182--191, 2004.Google Scholar
- Le. D. S, Thoma. G. R, and Wechsler. H. Automatic Page Orientation and Skew angle Detection for Binary Document Images. Pattern Recognition, 27:1325--1344, 1994.Google ScholarCross Ref
- Lu. Y and Tan. C. L. A nearest neighbor chain based approach to skew estimation in document images. Pattern Recognition Letters, 24:2315--2323, 2003. Google ScholarDigital Library
- Chen M and Ding X. A robust skew detection algorithm for grayscale document image. pages 617--620, 1999. Google ScholarDigital Library
- Dey P and Noushath S. e-pcp: A robust skew detection method for scanned document images. Pattern Recognition, In Press. Google ScholarDigital Library
- Pal. U and Anirban Sarkar. Recognition of Printed Urdu Script. In Proceedings of Intl Conf on Document Analysis and Recognition, pages 598--602, 2003. Google ScholarDigital Library
- Pavlidis. T and Zhou. J. Page segmentation by white streams. In Proceedings of 1st International Conference on Document Analysis and Recognition, pages 945--953, 1991.Google Scholar
- Postl. W. Detection of linear oblique structures and skew scan in digitized documents. In Proceedings 8th International Conference on Pattern Recognition, pages 687--689, 1986.Google Scholar
- Hough P. V. C. Methods and means for recognizing complex patterns. US Patent 3,069,654, December 18, 1962.Google Scholar
- Kapoor R, Bagai D, and Kamal T. S. A new algorithm for skew detection and correction. Pattern Recognition Letters, 25:1215--1229, 2004. Google ScholarDigital Library
- Smith R. A simple and éscient skew detection algorithm via text row accumulation. pages 1145--1148, 1995. Google ScholarDigital Library
- Chen S and Haralick R. M. An automatic algorithm for text skew estimation in document images using recursive morphological transforms. pages 139--143, 1994.Google Scholar
- Li S, Shen Q, and Sun J. Skew detection using wavelet decomposition and projection profile analysis. Pattern Recognition Letters, 28:555--562, 2007. Google ScholarDigital Library
- Uchida S, Sakai M, Iwamura M, Omachi S, and Kise K. Skew estimation by instances. pages 201--208, 2008. Google ScholarDigital Library
- Srihari. S. N and Govindaraju. V. Analysis of Textual Images using the Hough Transform. Machine Vision and Applications, 2:141--153, 1989.Google ScholarCross Ref
- Steinherz T, Intrator N, and Rivlin E. Skew detection via principal components analysis. pages 153--156, 1999. Google ScholarDigital Library
- Manjunath Aradhya V. N., Hemantha Kumar G, and Shivakumara P. Skew detection technique for binary document images based on hough transform. International Journal of Information Technology, 3:194--200, 2006.Google Scholar
- Aradhya V. N. M, Kumar G. H, and Noushath S. Document skew detection: A novel approach. International Journal of Image and Graphics, 8:47--59, 2008.Google ScholarCross Ref
- Aradhya V. N. M, Ashok Rao, and Kumar G. H. Language independent skew estimation technique based on gaussian mixture models: A case study on south indian scripts. In International Conference on Pattern Recognition and Machine Intelligence (PReMI), pages 487--493, 2007. Google ScholarDigital Library
- Chen Y and Wang J. Skew detection and reconstruction based on maximization of variance of transition-counts. Pattern Recognition, 33:195--208, 2000.Google ScholarCross Ref
- Ishitani Y. Document skew detection based on local region complexity. pages 49--52, 1993.Google Scholar
- Lee Y. Method of detecting the skew angle of a printed business form. Eastman Kodak Company, U.S. Patent 5,054,098, October 1, 1991.Google Scholar
- Yan. H. Skew correction of document images using interline cross-correlation. Computer Vision, Graphics, and Image Processing, 55:538--543, 1993. Google ScholarDigital Library
Index Terms
- Document skew estimation: an approach based on wavelets
Recommendations
Document image binarization using wavelets for OCR applications
ICVGIP '12: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image ProcessingIn this paper, a new Wavelet Transform based scheme is proposed for the binarization of document images with complex background and non-uniform illumination. The proposed scheme is simple and effective and does not require manual tuning of any ...
Generating summary documents for a variable-quality PDF document collection
DocEng '14: Proceedings of the 2014 ACM symposium on Document engineeringThe Cochrane Schizophrenia Group's Register of studies details all aspects of the effects of treating people with schizophrenia. It has been gathered over the last 20 years and consists of around 20,000 documents, overwhelmingly in PDF. Document ...
Efficient skew detection and correction in scanned document images through clustering of probabilistic hough transforms
Highlights- Novel simple and robust skew detection method for scanned documents.
- ...
AbstractDocuments scanning is still one of the widely used documents digitization steps; however, skew in scanned documents is inevitable. If this skew is not corrected, the extraction of region/s of interest (RoI) and further processing like; ...
Comments