Abstract
One of the main goals of the Contentus use case was to manage and improve the technical quality of large digital multimedia collections in cultural heritage organizations. Generally, there are two causes for quality impairment of digitized multimedia items: errors during the digitization process and a poor condition of the analog original. While digitization errors may be corrected by re-digitization, any deterioration of analog materials can only be counteracted by digital restoration in post-processing after digitization. This article showcases a unique technique developed in Contentus to restore digitized hectograph archive documents that typically display yellowed paper and faded printing ink. The documents used in this restoration showcase belong to the archive of the Music Information Center the Association of Composers and Musicologists (MIZ) of the former German Democratic Republic (GDR), and were produced between 1960 and 1989. The hectography method was widely adopted in the GDR to copy documents at a large scale. The showcased restoration method enhances the readability of on-screen texts and, as shown by evaluation, lowers the error rate of optical character recognition. In turn, the latter improvement is expected to improve the automated extraction of semantic information entities like persons, places and organizations. The technology presented in this article is an example of how corpora consisting of visually impaired analog media can be prepared for semantic search applications based on automatic content indexing – another major goal of the use case Contentus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
K. Bossert, N. Flores-Herr, J. Hannemann, CONTENTUS – Technologien für digitale Bibliotheken der nächsten Generation. Dialog mit Bibl. 21(1), 14–20 (2009)
F. Chang, C. Chen, C. Lu, A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93(2), 206–220 (2004)
A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing, 1st edn. (Wiley, Hoboken, 2002)
M. Dillencourt, H. Sammet, M. Tamminen, A general approach to connected-component labeling for arbitrary image representations. J. ACM (JACM) 39(2), 253–280 (1992)
M. Drew, S. Bergner, Spatio-chromatic decorrelation for color image compression. Image Commun. 23(8), 599–609 (2008)
G. Dunteman, Principal Components Analysis. Volume 69 of Quantitative Applications in the Social Sciences, 1st edn. (SAGE Publications, Thousand Oaks, 1989)
N. Flores-Herr, S. Eickeler, J. Nandzik, S. Paal, I. Konya, H. Sack, CONTENTUS – next generation multimedia library, in Internet der Dienste, ed. by L. Heuser, W. Wahlster (Springer, Berlin/Heidelberg/New York, 2011a), pp. 67–88
N. Flores-Herr, H. Sack, K. Bossert, Suche in Multimediaarchiven von Kultureinrichtungen, in Handbuch Internet-Suchmaschinen 2 – Neue Entwicklungen in der Websuche, 1st edn., ed. by D. Lewandowski (Akademische Verlagsgesellschaft AKA GmbH, Heidelberg, 2011b), pp. 113–140
B. Gatos, K. Ntirogiannis, I. Pratikakis, DIBCO 2009: document image binarization contest. Int. J. Doc. Anal. Recognit. (IJDAR) 14(1), 35–44 (2011)
R. Gonzalez, R. Woods, Digital Image Processing, 2nd edn. (Prentice Hall International, Upper Saddle River, 2001)
A. Hyvaerinen, Fast and robust fixed-point algorithms for independent component analysis. Neural Netw. 10(3), 626–634 (1999)
A. Hyvaerinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, Hoboken, 2001)
I. Jolliffe, Principal Component Analysis. Springer Series in Statistics, 2nd edn. (Springer, Berlin/Heidelberg/New York, 2002)
V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys.-Dokl. 10(8), 707–710 (1966)
J. Nandzik, B. Litz, A. Löhden, A. Heß, I. Konya, D. Baum, A. Bergholz, D. Schönfuß, C. Fey, J. Osterhoff, J. Waitelonis, H. Sack, R. Köhler, P. Ndjiki-Nya, CONTENTUS – technologies for next generation multimedia libraries. Multimed. Tools Appl. 63(2), 287–329 (2013)
Y. Ohta, T. Kanade, T. Sakai, Color information for region segmentation. Comput. Graph. Image Process. 13(3), 222–241 (1980)
N. Otsu, A threshold selection method from gray-level histograms. Syst. Man Cybern. 9(1), 62–66 (1989)
J. Sauvola, M. Pietikainen, Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)
A. Tonazzini, L. Bedini, E. Salerno, Independent component analysis for document restoration. Doc. Anal. Recognit. 7(1), 17–27 (2004), http://dblp.uni-trier.de/db/journals/ijdar/ijdar7.html#TonazziniBS04
O. Trier, A. Jain, Goal-directed evaluation of binarization methods. Pattern Anal. Mach. Intell. 17(12), 1191–1201 (1995)
R. Wallor, Ein Ansatz zur ontologiebasierten Wissensrepräsentation. Am Beispiel des Musikinformationszentrums des Verbandes der Komponisten und Musikwissenschaftler der DDR. Master’s thesis, Humboldt-Universität, Berlin, 2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Konya, I., Eickeler, S., Nandzik, J., Flores-Herr, N. (2014). Print Processing in Contentus: Restoration of Digitized Print Media. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-06755-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06754-4
Online ISBN: 978-3-319-06755-1
eBook Packages: Computer ScienceComputer Science (R0)