Skip to main content
Log in

Document image characterization using a multiresolution analysis of the texture: application to old documents

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this article, we propose a method of characterization of images of old documents based on a texture approach. This characterization is carried out with the help of a multi-resolution study of the textures contained in the images of the document. Thus, by extracting five features linked to the frequencies and to the orientations in the different areas of a page, it is possible to extract and compare elements of high semantic level without expressing any hypothesis about the physical or logical structure of the analyzed documents. Experimentation based on segmentation, data analysis and document image retrieval tools demonstrate the performance of our propositions and the advances that they represent in terms of characterization of content of a deeply heterogeneous corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Allier, B., Emptoz, H.: Font type extraction and character prototyping using gabor filters. ICDAR 02, 799–804 (2003). http://doi.ieeecomputersociety.org/

  2. Antonacopoulos A.: Page segmentation using the description of the background. Comput. Vis. Image Underst. 70(3), 350–369 (1998). doi:10.1006/cviu.1998.0691

    Article  Google Scholar 

  3. Basa P., Sabari P.S., Nishikanta R.: Gabor filters for document analysis in Indian bilingual documents. Proc. Int. Conf. Intell. Sens. Inf. Process. 1, 123–126 (2004)

    Google Scholar 

  4. Bres, S.: Contributions a la quantification des critFres de transparence et d’anisotropie par une approche globale. Ph.D. thesis, LIRIS, Université de Lyon (1994)

  5. Caron Y., Charpentier H., Makris P., Vincent N.: Power law dependencies to detect regions of interest. Lect. Notes Comput. Sci. 2886, 495–503 (2003)

    Google Scholar 

  6. Chan W., Coghill G.: Text analysis using local energy. Pattern Recognit. 34(12), 2523–2532 (2001)

    Article  MATH  Google Scholar 

  7. Chetverikov, D., Liang, J., Komuves, J., Haralick, R.M.: Zone classification using texture features. In: ICPR ’96, vol. III–7276, p. 676. IEEE Computer Society, Washington, DC (1996)

  8. Cinque L., Lombardi L., Manzini G.: A multiresolution approach for page segmentation. Pattern Recogn. Lett. 19(2), 217–225 (1998). doi:10.1016/S0167-8655(97)00169-4

    Article  Google Scholar 

  9. Doermann, D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Underst. CVIU 70(3), 287–298 (1998). http://citeseer.ist.psu.edu/doermann98indexing.html

  10. Eglin, V.: Contribution a la structuration fonctionnelle des documents imprims. Ph.D. thesis, LIRIS (1998)

  11. Eglin V., Bres S.: Analysis and interpretation of visual saliency for document functional labeling. Int. J. Doc. Anal. Recognit. 7(1), 28–43 (2004). doi:10.1007/s10032-004-0127-2

    Google Scholar 

  12. Etemad K., Doermann D., Chellappa R.: Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 92–96 (1997). doi:10.1109/34.566817

    Article  Google Scholar 

  13. Hall-Beyer, M.: Glcm texture: a tutorial. Technical report (2000). http://www.cas.sc.edu/geog/rslab/Rscc/mod6/6-5/texture/tutorial.html, GLCM

  14. Haralick R., Shanmugam K., Dinstein I.: Textural features for image classification. SMC 3(6), 610–621 (1973)

    Google Scholar 

  15. Journet, N., Mullot, R., Ramel, J.Y., Eglin, V.: Ancient printed documents indexation: a new approach. In: ICAPR (1), pp. 580–589 (2005)

  16. Kaufman L., Rousseeuw P.J.: Finding Groups in Data. Wiley, New York (1990)

    Google Scholar 

  17. Khedekar, S., Ramanaprasad, V., Setlur, S., Govindaraju, V.: Text–image separation in devanagari documents. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, p. 1265. IEEE Computer Society, Washington, DC (2003)

  18. Laws, K.I.: Rapid texture identification. In: Image processing for missile guidance; Proceedings of the Seminar, San Diego, CA, July 29–August 1, 1980 (A81-39326 18-04) Bellingham, WA, Society of Photo-Optical Instrumentation Engineers, pp. 376–380 (1980)

  19. Ma, H., Doermann, D.: Gabor filter based multi-class classifier for scanned document images. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, p. 968. IEEE Computer Society, Washington, DC (2003)

  20. Maderlechner G., Suda P., Breckner T.: Classification of documents by form and content. Pattern Recogn. Lett. 18(11–13), 1225–1231 (1997). doi:10.1016/S0167-8655(97)00098-6

    Article  Google Scholar 

  21. Mao S., Rosenfeld A., Kanungo T.: Document structure analysis algorithms: a literature survey. SPIE 5010, 197–207 (2003)

    Article  Google Scholar 

  22. Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: Proceedings of DIAL ’06, pp. 243–253. IEEE Computer Society, Washington, DC (2006). doi:10.1109/DIAL.2006.44

  23. Nagy, G., Kanai, J., Krishnamoorthy, M., Thomas, M., Viswanathan, M.: Two complementary techniques for digitized document analysis. In: DOCPROCS ’88: Proceedings of the ACM Conference on Document Processing Systems, pp. 169–176. ACM Press, New York (1988). doi:10.1145/62506.62539

  24. Nicolas S., Kessentini Y., Paquet T., Heutte L.: Handwritten document segmentation using hidden Markov random fields. ICDAR 1, 212–216 (2006)

    Google Scholar 

  25. Pavlidis T., Zhou J.: Page segmentation by white streams. ICDAR 2, 945–953 (1991)

    Google Scholar 

  26. Ramel J., Busson S., Demonet M.: Agora: the interactive document image analysis tool of the bvh project. DIAL 0, 145–155 (2006). doi:10.1109/DIAL.2006.2

    Google Scholar 

  27. Shafait F., Keysers D., Breuel T.M.: Performance comparison of six algorithms for page segmentation. In: Procedings of the Seventh IAPR Workshop on Document Analysis Systems (DAS) 3872, 368–379 (2006)

    Google Scholar 

  28. Shi Z., Govindaraju V.: Multi-scale techniques for document page segmentation. ICDAR 0, 1020–1024 (2005). doi:10.1109/ICDAR.2005.165

    Google Scholar 

  29. Tuceryan, M.: Moment-based texture segmentation. PRL 15(7), 659–668 (1994). http://citeseer.ist.psu.edu/tuceryan94moment.html

    Google Scholar 

  30. Uttama, S., Ogier, J., Loonis, P.: Top-down segmentation of ancient graphical drop caps. GREC, pp. 87–95 (2005)

  31. Wong K.Y., Casey R.G., Wahl F.M.: Document analysis system. IBM J. Res. Dev. 26(6), 647–656 (1982)

    Article  Google Scholar 

  32. Youness G., Saporta G.: Une méthodologie pour la comparaison de partitions. Revue de Statistique Appliquée 52, 97–120 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicholas Journet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Journet, N., Ramel, JY., Mullot, R. et al. Document image characterization using a multiresolution analysis of the texture: application to old documents. IJDAR 11, 9–18 (2008). https://doi.org/10.1007/s10032-008-0064-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-008-0064-6

Keywords

Navigation