Using colour information to understand censorship cards of film archives

Altamura, Oronzo; Berardi, Margherita; Ceci, Michelangelo; Malerba, Donato; Varlaro, Antonio

doi:10.1007/s10032-006-0021-1

Using colour information to understand censorship cards of film archives

Original Paper
Published: 08 August 2006

Volume 9, pages 281–297, (2007)
Cite this article

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Oronzo Altamura¹,
Margherita Berardi¹,
Michelangelo Ceci¹,
Donato Malerba¹ &
…
Antonio Varlaro¹

85 Accesses
3 Citations
Explore all metrics

Abstract

Many European film archives are involved in the digitization of 20th century historical paper documents. In the context of the IST project COLLATE three of them were interested in the semi-automatic annotation of censorship cards and their subsequent retrieval on the basis of both annotations and content. Processing censorship cards, which is the main subject of this paper, leads to a number of challenges for many document image analysis (DIA) systems. Problems arise due to the low layout quality and standard of such material, which introduces a considerable amount of noise in its description. The layout quality is often negatively affected by the presence of stamps, signatures, ink specks, manual annotations and so on that overlap those layout components involved in the understanding or annotation processes. In order to effectively reduce the presence and the effect of noise, we propose an improved version of the knowledge-based DIA system WISDOM++ allowing it to take full advantage of the use of colour information in all processing steps: namely, image segmentation, layout analysis, document image classification and understanding. Experiments have been conducted on a corpus of multi-format documents concerning rare historic film censorships provided by the three film archives involved in the COLLATE project.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aiello M., Monz C., Todoran L., Worring M. (2002). Document understanding for a broad class of documents. Int. J. Doc. Anal. Recogn. 5(1):1–16
Article MATH Google Scholar
Altamura O., Esposito F., Malerba D. (2001). Transforming paper documents into XML format with WISDOM++. Int. J. Doc. Anal. Recogn. 4(1):2–17
Article Google Scholar
Antonacopoulos, A., Karatzas, D.: Document image analysis for World War II personal records. In: 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 336–341 (2004)
Antonacopoulos, A., Karatzas, D., Krawczyk, H., Wiszniewski, B.: The lifecycle of a digital historical document: structure and content. In: Munson, E.V., Vion-Dury J.Y. (eds.) Proceedings of the 2004 ACM Symposium on Document Engineering, pp. 147–154. ACM (2004)
Bensaid A., Hall L.O., Bezdek J.C., Clarke L.P. (1996). Partially supervised clustering for image segmentation. Pattern Recogn. 29(5): 859–871
Article Google Scholar
Berardi M., Varlaro A., Malerba D. (2004). On the effect of caching in recursive theory learning. In: Camacho R., King R.D., Srinivasan A. (eds) Inductive Logic Programming, Lecture Notes in Computer Science, vol 3194. Springer, Berlin Heidelberg New York, pp. 44–62
Google Scholar
Cheng H.D., Jiang X., Sun Y., Wang J. (2001). Color image segmentation: advances and prospects. Pattern Recogn. 34(12):2259–2281
Article MATH Google Scholar
Esposito F., Malerba D., Marengo V. (2001). Inductive learning from numerical and symbolic data: an integrated framework. Intell. Data Anal. 5(6):445–461
MATH Google Scholar
Frommholz, I., Brocks, H., Thiel, U., Neuhold, E.J., Iannone, L., Semeraro, G., Berardi, M., Ceci, M.: Document-centered collaboration for scholars in the humanities – the collate system. In: European Conference on Research and Advanced Technology for Digital Libraries, pp. 434–445 (2003)
Gatos B., Ntzios K., Pratikakis I., Petridis S., Konidaris T., Perantonis S.J. (2004). A segmentation-free recognition technique to assist old greek handwritten manuscript ocr. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol 3163. Springer, Berlin Heidelberg New York, pp. 63–74
Google Scholar
Gatos B., Pratikakis I., Perantonis S.J. (2004). An adaptive binarization technique for low quality historical documents. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 102–113
Google Scholar
Gervauz, M., Purgathofer, W.: A simple method for color quantization: octree quantization. Graphic Gems, pp. 287–293 (1990)
Hase H., Yoneda M., Tokai S., Kato J., Suen C.Y. (2003). Color segmentation for text extraction. Int. J. Doc. Anal. Recogn. 6(4):271–284
Article Google Scholar
He J., Downton A.C. (2004). Configurable text stamp identification tool with application of fuzzy logic. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 201–212
Google Scholar
Karatzas, D., Antonacopoulos, A.: Two approaches for text segmentation in web images. In: International Conference on Document Analysis and Recognition, pp. 131–136 (2003)
Klink S., Kieninger T. (2001). Rule-based document structure understanding with a fuzzy combination of layout and textual features. Int. J. Doc. Anal. Recogn. 4(1):18–26
Article Google Scholar
Le Bourgeois F., Kaileh H. (2004). Automatic metadata retrieval from ancient manuscripts. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 75–89
Google Scholar
Lee K.H., Choy Y.C., Cho S.B. (2000). Geometric structure analysis of document images: A knowledge-based approach. IEEE Trans. Pattern Anal. Mach. Intell. 22(11):1224–1240
Article Google Scholar
Levi G., Sirovich F. (1976). Generalized and/or graphs. Artif. Intell. 7(3):243–259
Article MATH MathSciNet Google Scholar
Lucchese, L., Mitra, S.K.: An algorithm for fast segmentation of color images,. In: Proceedings of IEEE 10th Tyrrhenian Workshop on Digital Communication, pp. 110–119 (1998)
Lucchese, L., Mitra, S.K.: Advances in color image segmentation. In: Proceedings of Globecom’99, pp. 2038–2044 (1999)
Malerba D. (2003). Learning recursive theories in the normal ilp setting. Fundamenta Informaticae 57(1):39–77
MATH MathSciNet Google Scholar
Malerba, D., Esposito, F., Lisi, F.A., Altamura, O.: Automated discovery of dependencies between logical components in document image understanding. In: International Conference on Document Analysis and Recognition, pp. 174–178 (2001)
Malerba, D., Esposito, F., Altamura, O., Ceci, M., Berardi, M.: Correcting the document layout: a machine learning approach. In: International Conference on Document Analysis and Recognition, p. 97 (2003)
Mello, C.A.B., Lins, R.D.: Image segmentation of historical documents. In: Visual2000: 3rd International Conference on Visual Computing (2000)
Mitchell T. (1997). Machine Learning. McGraw Hill, New York
MATH Google Scholar
Moghaddamzadeh A., Bourbakis N.G. (1997). A fuzzy region growing approach for segmentation of color images. Pattern Recogn. 30(6):867–881
Article Google Scholar
Nicolas S., Paquet T., Heutte L. (2004). Enriching historical manuscripts: The bovary project. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 135–146
Google Scholar
Niyogi, D., Srihari, S.N.: Knowledge-based derivation of document logical structure. In: International Conference on Document Analysis and Recognition, pp. 472–475 (1995)
Palmero, G.I.S., Dimitriadis, Y.A.: Structured document labeling and rule extraction using a new recurrent fuzzy-neural system. In: International Conference on Document Analysis and Recognition, pp. 181–184 (1999)
Perroud, T., Sobottka, K., Bunke, H., Hall, L.: Text extraction from color documents – clustering approaches in three and four dimensions. In: International Conference on Document Analysis and Recognition, pp. 937–941 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. (1993)
Shih Y., Chen S.S. (1996). Adaptive document block segmentation and classification. IEEE Trans. Syst. Man Cybern Part B 26(5):797–802
Article Google Scholar
Sobottka K., Kronenberg H., Perroud T., Bunke H. (2000). Text extraction from colored book and journal covers. Int. J. Doc. Anal. Recogn. 2(4):163–176
Google Scholar
Trémeau A., Borel N. (1997). A region growing and merging algorithm to color segmentation. Pattern Recogn. 30(7):1191–1203
Article Google Scholar
Utgoff, P.: An improved algorithm for incremental induction of decision trees. In: Proceedings of the Eleventh Internatinal Conference on Machine Learning. Morgan Kaufmann (1994)
Wong K., Casey R., Wahl F. (1982). Document analysis system. IBM J. Res. Dev. 26(6):647–656
Article Google Scholar
Zhong Y., Karu K., Jain A.K. (1995). Locating text in complex color images. Pattern Recogn. 28(10):1523–1535
Article Google Scholar
Zhou, J., Lopresti, D.P.: Extracting text from www images. In: International Conference Document Analysis and Recognition, pp. 248–252. IEEE Computer Society (1997)

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi, via Orabona, 4, 70126, Bari, Italy
Oronzo Altamura, Margherita Berardi, Michelangelo Ceci, Donato Malerba & Antonio Varlaro

Authors

Oronzo Altamura
View author publications
You can also search for this author inPubMed Google Scholar
Margherita Berardi
View author publications
You can also search for this author inPubMed Google Scholar
Michelangelo Ceci
View author publications
You can also search for this author inPubMed Google Scholar
Donato Malerba
View author publications
You can also search for this author inPubMed Google Scholar
Antonio Varlaro
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Michelangelo Ceci.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altamura, O., Berardi, M., Ceci, M. et al. Using colour information to understand censorship cards of film archives. IJDAR 9, 281–297 (2007). https://doi.org/10.1007/s10032-006-0021-1

Download citation

Received: 09 March 2005
Revised: 11 October 2005
Accepted: 28 May 2006
Published: 08 August 2006
Issue Date: April 2007
DOI: https://doi.org/10.1007/s10032-006-0021-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using colour information to understand censorship cards of film archives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Towards a Digital Infrastructure for Illustrated Handwritten Archives

MultiDIAS: A Hierarchical Multi-layered Document Image Annotation System

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Using colour information to understand censorship cards of film archives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Towards a Digital Infrastructure for Illustrated Handwritten Archives

MultiDIAS: A Hierarchical Multi-layered Document Image Annotation System

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now