Skip to main content
Log in

Efficient skew detection of printed document images based on novel combination of enhanced profiles

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Document skew is often introduced during the capturing process of the document image processing pipeline and may seriously affect the performance of subsequent stages of segmentation and recognition. Skew detection is often accomplished with the use of horizontal projections, while recently, a new approach that is based on vertical projections has been introduced. In this paper, we use the technique of minimum bounding box area in order to combine a horizontal with a new reinforced vertical projection profile method. We are motivated by the fact that the horizontal and the novel vertical projection profiles are found to be complementary to each other. We claim that the proposed approach has more accurate performance compared with other state-of-the-art skew detection algorithms; it deals with all the drawbacks of the projection profile methods; it is more noise and warp resistant and gives accurate results for any kind of printed document image. For these reasons, it can be efficiently applied to historical machine printed or multicolumn documents, documents with figures and tables, while it is robust for any kind of script. Extended experimental results on two databases in different skew angle range, with representative printed documents of all kinds, as well as printed documents of two historical books, prove the efficiency of the proposed approach. There is also a comparison with commercial products in several cases where the contribution of the proposed algorithm is demonstrated at optical character recognition level. Moreover, an analysis of the accuracy performance of the main elements of the proposed technique is also performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Sarfraz, M., Rasheed, Z.: Skew estimation and correction of text using bounding box. In: Proceedings of Fifth International Conference on Computer Graphics, Imaging and Visualization, (CGIV ‘08), pp. 259–264 (2008)

  2. Sadri, J., Cheriet, M.: A new approach for skew correction of documents based on particle swarm optimization. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR‘09, pp. 1066–1070 (2009)

  3. Cattoni, R., Coianiz, T., Messelodi, S., Modena, C.M. : Geometric layout analysis techniques for document image understanding: a review. Tech. Rep. 9703–09, IRST, Trento, Italy (1998)

  4. Sharif, A.E., Movahhedinia, N.: On skew estimation of Persian/Arabic printed documents. J. Appl. Sci. 8(12), 2265–2271 (2008)

    Article  Google Scholar 

  5. Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997S (2004)

    Article  Google Scholar 

  6. Srihari, N., Govindaraju, V.: Analysis of textual images using the Hough transform. Mach. Vis. A 2(3), 141–153 (1989)

  7. Hinds, J., Fisher, L., D’Amato, D.P.: A document skew detection method using run-length encoding and the Hough transform. In: Proceedings of the 10th International Conference Pattern Recognition. IEEE CS Press, Los Alamitos, CA, pp. 464–468 (1990)

  8. Wang, J., Leung, M.K.H., Hui, S.C.: Cursive word reference line detection. Pattern Recogn. 30(3), 503–511 (1997)

    Article  Google Scholar 

  9. Yu, B., Jain, A.K.: A robust and fast skew detection algorithm for generic documents. Pattern Recogn. 29(10), 1599–1629 (1996)

    Article  Google Scholar 

  10. Singh, C., Bhatia, N., Kaur, A.: Hough transform based fast skew detection and accurate skew correction methods. Pattern Recogn. 41, 3528–3546 (2008)

    Article  MATH  Google Scholar 

  11. Hashizume, A., Yeh, P.S., Rosenfeld, A.: A method of detecting the orientation of aligned components. Pattern Recogn. Lett. 4, 125–132 (1986)

    Article  Google Scholar 

  12. Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)

    Article  Google Scholar 

  13. Lu, Y., Tan, C.L.: A nearest-neighbor chain based approach to skew estimation in document images. Pattern Recogn. Lett. 24, 2315–2323 (2003)

    Article  Google Scholar 

  14. Okun, O., Pietikainen, M., Sauvola, J. : Robust document skew detection based on line extraction. In: Proceedings of the 11th Scandinavian Conference on Image Analysis (SCIA’99), June 7–11, Kangerlussuaq, Greenland, pp. 457–464 (1999)

  15. Yan, H.: Skew correction of document images using interline cross-correlation. CVGIP: Graph. Models Image Process. 55(6), 538–543 (1993)

    Google Scholar 

  16. Gatos, B., Papamarkos, N., Chamzas, C.: Skew detection and text line position determination in digitized documents. Pattern Recogn. 30(9), 1505–1519 (1997)

    Article  Google Scholar 

  17. Chou, C.-H., Chu, S.-Y., Chang, F.: Estimation of skew angles for scanned documents based on piecewise covering by parallelograms. Pattern Recogn. 40, 443–455 (2007)

    Article  MATH  Google Scholar 

  18. Deya, P., Noushath, S.: e-PCP: A robust skew detection method for scanned document images. Pattern Recogn. 43, 937–948 (2010)

    Article  Google Scholar 

  19. Alireza, A., Umapadam, P., Nagabhushanm, P., Kimura, F.: A painting based technique for skew estimation of scanned documents. International Conference on Document Analysis and Recognition, pp. 299–303 (2011)

  20. Postl, W.: Detection of linear oblique structures and skew scan in digitized documents. In: Proceedings of the 8th International Conference on Pattern Recognition, pp. 687–689 (1986)

  21. Papandreou, A., Gatos, B.: A novel skew detection technique based on vertical projections. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, ICDAR ’11, pp. 384–388 (2011)

  22. Baird, H.S.: The skew angle of printed documents. In: Proceedings of the SPSE 40th Symposium Hybrid Imaging Systems, Rochester, NY, pp. 739–743M (1987)

  23. Ciardiello, G., Scafuro, G., Degrandi, M.T., Spada, M.R., Roccotelli, M.P.: An experimental system for office document handling and text recognition. In: Proceedings of the 9th International Conference on Pattern Recognition, pp. 739–743 (1988)

  24. Ishitani, Y.: Document skew detection based on local region complexity. In: Proceedings of the 2nd International Conference on Document Analysis and Recognition, Tsukuba, Japan, pp. 49–52 (1993)

  25. Safabakhsh, R.: Document skew detection using minimum-area bounding rectangle. In: Proceedings of the International Conference on Information Technology: Coding and Computing ITCC 00, pp. 253–258 (2000)

  26. Li, S., Qinghua, S., Jun, S.: Skew detection using wavelet decomposition and projection profile analysis. Pattern Recogn. Lett. 28(5), 555–562 (2007)

    Article  Google Scholar 

  27. Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recogn. 39, 317–327 (2006)

    Article  MATH  Google Scholar 

  28. https://www.iit.demokritos.gr/~alexpap/dataset_A.rar

  29. von Eckartshausen, C.: Aufschlüsse zur Magie aus geprüften Erfahrungen über verborgene philosophische Wissenschaften und verdeckte Geheimnisse der Natur, Bavarian State Library (1778)

  30. Le Dernier fils de France, ou le Duc de Normandie, fils de Louis XVI et de Marie-Antoinette, Bibliothèque nationale de France (1838)

Download references

Acknowledgments

The research leading to these results has received funding from the European Union’s Seventh Framework Program under Grant agreement No. 215064—IMPACT as well as from the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant agreement No. 600707—tranScriptorium.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Papandreou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Papandreou, A., Gatos, B., Perantonis, S.J. et al. Efficient skew detection of printed document images based on novel combination of enhanced profiles. IJDAR 17, 433–454 (2014). https://doi.org/10.1007/s10032-014-0228-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-014-0228-5

Keywords

Navigation