Abstract
Document classification is an important task in all the processes related to document storage and retrieval. In the case of complex documents, structural features are needed to achieve a correct classification. Unfortunately, physical layout analysis is error prone. In this paper we present a pre-segmentation step based on a divide & conquer strategy that can be used to improve the page segmentation results, independently of the segmentation algorithm used. This pre-segmentation step is evaluated in classification and retrieval using the selective CRLA algorithm for layout segmentation together with a clustering based on the voronoi area diagram, and tested on two different databases, MARG and Girona Archives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jain, A.K., Bhattacharjee, S.: Text segmentation using Gabor filters for automatic document processing. Machine Vission Appl. 5, 169–184 (1992)
O’Gorman, L.: The Document Spectrum for Page Layout Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1162–1173 (1993)
Baird, H.S.: Background structure in document images. In: Document Image Analysis, pp. 17–34. World Scientific, Singapore (1994)
Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recognit. 10(1), 1–16 (2007)
Cesarini, F., Lastri, M., Marinai, S., Soda, G.: Encoding of modified X-Y trees for document classification. In: Proceedings Sixth International Conference on Document Analysis and Recognition, pp. 1131–1136 (2001)
Shafait, F., Keysers, D., Breuel, T.M.: Performance Comparison of Six Algorithms for Page Segmentation. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 368–379. Springer, Heidelberg (2006)
Keysers, D., Deselaers, T., Ney, H.: Pixel-to-Pixel Matching for Image Recognition using Hungarian Graph Matching. In: DAGM 2004, Pattern Recognition, 26th DAGM Symposium, pp. 154–162 (2004)
Kise, K., Sato, A., Iwata, M.: Segmentation of page images using the area Voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)
Sun, H.: Page segmentation for Manhattan and non-Manhattan layout documents via selective CRLA. In: Proc. Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 116–120 (2005)
Nagy, G., Seth, S.: Hierarchical representation of optically scanned documents. In: Proc. Seventh Int. Conf. Patt. Recogn (ICPR), pp. 347–349 (1984)
van Beusekom, J., Keysers, D., Shafait, F., Breuel, T.M.: Distance measures for layout-based document image retrieval. In: Second International Conference on Document Image Analysis for Libraries(DIAL) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gordo, A., Valveny, E. (2009). The Diagonal Split: A Pre-segmentation Step for Page Layout Analysis and Classification. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2009. Lecture Notes in Computer Science, vol 5524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02172-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-02172-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02171-8
Online ISBN: 978-3-642-02172-5
eBook Packages: Computer ScienceComputer Science (R0)