Document image characterization using a multiresolution analysis of the texture: application to old documents

Journet, Nicholas; Ramel, Jean-Yves; Mullot, Rémy; Eglin, Véronique

doi:10.1007/s10032-008-0064-6

Document image characterization using a multiresolution analysis of the texture: application to old documents

Original Paper
Published: 24 June 2008

Volume 11, pages 9–18, (2008)
Cite this article

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Nicholas Journet¹,
Jean-Yves Ramel¹,
Rémy Mullot² &
…
Véronique Eglin³

262 Accesses
45 Citations
Explore all metrics

Abstract

In this article, we propose a method of characterization of images of old documents based on a texture approach. This characterization is carried out with the help of a multi-resolution study of the textures contained in the images of the document. Thus, by extracting five features linked to the frequencies and to the orientations in the different areas of a page, it is possible to extract and compare elements of high semantic level without expressing any hypothesis about the physical or logical structure of the analyzed documents. Experimentation based on segmentation, data analysis and document image retrieval tools demonstrate the performance of our propositions and the advances that they represent in terms of characterization of content of a deeply heterogeneous corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering documents in evolving languages by image texture analysis

Article 26 December 2016

Texture feature benchmarking and evaluation for historical document image analysis

Article 05 January 2017

Improving content-based image retrieval for heterogeneous datasets using histogram-based descriptors

Article 01 May 2017

References

Allier, B., Emptoz, H.: Font type extraction and character prototyping using gabor filters. ICDAR 02, 799–804 (2003). http://doi.ieeecomputersociety.org/
Antonacopoulos A.: Page segmentation using the description of the background. Comput. Vis. Image Underst. 70(3), 350–369 (1998). doi:10.1006/cviu.1998.0691
Article Google Scholar
Basa P., Sabari P.S., Nishikanta R.: Gabor filters for document analysis in Indian bilingual documents. Proc. Int. Conf. Intell. Sens. Inf. Process. 1, 123–126 (2004)
Google Scholar
Bres, S.: Contributions a la quantification des critFres de transparence et d’anisotropie par une approche globale. Ph.D. thesis, LIRIS, Université de Lyon (1994)
Caron Y., Charpentier H., Makris P., Vincent N.: Power law dependencies to detect regions of interest. Lect. Notes Comput. Sci. 2886, 495–503 (2003)
Google Scholar
Chan W., Coghill G.: Text analysis using local energy. Pattern Recognit. 34(12), 2523–2532 (2001)
Article MATH Google Scholar
Chetverikov, D., Liang, J., Komuves, J., Haralick, R.M.: Zone classification using texture features. In: ICPR ’96, vol. III–7276, p. 676. IEEE Computer Society, Washington, DC (1996)
Cinque L., Lombardi L., Manzini G.: A multiresolution approach for page segmentation. Pattern Recogn. Lett. 19(2), 217–225 (1998). doi:10.1016/S0167-8655(97)00169-4
Article Google Scholar
Doermann, D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Underst. CVIU 70(3), 287–298 (1998). http://citeseer.ist.psu.edu/doermann98indexing.html
Eglin, V.: Contribution a la structuration fonctionnelle des documents imprims. Ph.D. thesis, LIRIS (1998)
Eglin V., Bres S.: Analysis and interpretation of visual saliency for document functional labeling. Int. J. Doc. Anal. Recognit. 7(1), 28–43 (2004). doi:10.1007/s10032-004-0127-2
Google Scholar
Etemad K., Doermann D., Chellappa R.: Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 92–96 (1997). doi:10.1109/34.566817
Article Google Scholar
Hall-Beyer, M.: Glcm texture: a tutorial. Technical report (2000). http://www.cas.sc.edu/geog/rslab/Rscc/mod6/6-5/texture/tutorial.html, GLCM
Haralick R., Shanmugam K., Dinstein I.: Textural features for image classification. SMC 3(6), 610–621 (1973)
Google Scholar
Journet, N., Mullot, R., Ramel, J.Y., Eglin, V.: Ancient printed documents indexation: a new approach. In: ICAPR (1), pp. 580–589 (2005)
Kaufman L., Rousseeuw P.J.: Finding Groups in Data. Wiley, New York (1990)
Google Scholar
Khedekar, S., Ramanaprasad, V., Setlur, S., Govindaraju, V.: Text–image separation in devanagari documents. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, p. 1265. IEEE Computer Society, Washington, DC (2003)
Laws, K.I.: Rapid texture identification. In: Image processing for missile guidance; Proceedings of the Seminar, San Diego, CA, July 29–August 1, 1980 (A81-39326 18-04) Bellingham, WA, Society of Photo-Optical Instrumentation Engineers, pp. 376–380 (1980)
Ma, H., Doermann, D.: Gabor filter based multi-class classifier for scanned document images. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, p. 968. IEEE Computer Society, Washington, DC (2003)
Maderlechner G., Suda P., Breckner T.: Classification of documents by form and content. Pattern Recogn. Lett. 18(11–13), 1225–1231 (1997). doi:10.1016/S0167-8655(97)00098-6
Article Google Scholar
Mao S., Rosenfeld A., Kanungo T.: Document structure analysis algorithms: a literature survey. SPIE 5010, 197–207 (2003)
Article Google Scholar
Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: Proceedings of DIAL ’06, pp. 243–253. IEEE Computer Society, Washington, DC (2006). doi:10.1109/DIAL.2006.44
Nagy, G., Kanai, J., Krishnamoorthy, M., Thomas, M., Viswanathan, M.: Two complementary techniques for digitized document analysis. In: DOCPROCS ’88: Proceedings of the ACM Conference on Document Processing Systems, pp. 169–176. ACM Press, New York (1988). doi:10.1145/62506.62539
Nicolas S., Kessentini Y., Paquet T., Heutte L.: Handwritten document segmentation using hidden Markov random fields. ICDAR 1, 212–216 (2006)
Google Scholar
Pavlidis T., Zhou J.: Page segmentation by white streams. ICDAR 2, 945–953 (1991)
Google Scholar
Ramel J., Busson S., Demonet M.: Agora: the interactive document image analysis tool of the bvh project. DIAL 0, 145–155 (2006). doi:10.1109/DIAL.2006.2
Google Scholar
Shafait F., Keysers D., Breuel T.M.: Performance comparison of six algorithms for page segmentation. In: Procedings of the Seventh IAPR Workshop on Document Analysis Systems (DAS) 3872, 368–379 (2006)
Google Scholar
Shi Z., Govindaraju V.: Multi-scale techniques for document page segmentation. ICDAR 0, 1020–1024 (2005). doi:10.1109/ICDAR.2005.165
Google Scholar
Tuceryan, M.: Moment-based texture segmentation. PRL 15(7), 659–668 (1994). http://citeseer.ist.psu.edu/tuceryan94moment.html
Google Scholar
Uttama, S., Ogier, J., Loonis, P.: Top-down segmentation of ancient graphical drop caps. GREC, pp. 87–95 (2005)
Wong K.Y., Casey R.G., Wahl F.M.: Document analysis system. IBM J. Res. Dev. 26(6), 647–656 (1982)
Article Google Scholar
Youness G., Saporta G.: Une méthodologie pour la comparaison de partitions. Revue de Statistique Appliquée 52, 97–120 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

LI, 64 Avenue Jean Portalis, 37200, Tours, France
Nicholas Journet & Jean-Yves Ramel
L3I, 17042, La Rochelle Cedex 1, France
Rémy Mullot
LIRIS INSA de Lyon, Villeurbanne Cedex, France
Véronique Eglin

Authors

Nicholas Journet
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves Ramel
View author publications
You can also search for this author in PubMed Google Scholar
Rémy Mullot
View author publications
You can also search for this author in PubMed Google Scholar
Véronique Eglin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicholas Journet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Journet, N., Ramel, JY., Mullot, R. et al. Document image characterization using a multiresolution analysis of the texture: application to old documents. IJDAR 11, 9–18 (2008). https://doi.org/10.1007/s10032-008-0064-6

Download citation

Received: 05 April 2007
Revised: 04 April 2008
Accepted: 30 April 2008
Published: 24 June 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10032-008-0064-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Document image characterization using a multiresolution analysis of the texture: application to old documents

Abstract

Access this article

Similar content being viewed by others

Clustering documents in evolving languages by image texture analysis

Texture feature benchmarking and evaluation for historical document image analysis

Improving content-based image retrieval for heterogeneous datasets using histogram-based descriptors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Document image characterization using a multiresolution analysis of the texture: application to old documents

Abstract

Access this article

Similar content being viewed by others

Clustering documents in evolving languages by image texture analysis

Texture feature benchmarking and evaluation for historical document image analysis

Improving content-based image retrieval for heterogeneous datasets using histogram-based descriptors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation