Visual enhancement of old documents with hyperspectral imaging

doi:10.1016/j.patcog.2010.12.019

Pattern Recognition

Volume 44, Issue 7, July 2011, Pages 1461-1469

https://doi.org/10.1016/j.patcog.2010.12.019 Get rights and content

Abstract

Hyperspectral imaging (HSI) of historical documents is becoming more common at national libraries and archives. HSI is useful for many tasks related to document conservation and management as it provides detailed quantitative measurements of the spectral reflectance of the document that is not limited to the visible spectrum. In this paper, we focus on how to use the invisible spectra, most notably near-infrared (NIR) bands, to assist in visually enhancing old documents. Specifically, we demonstrate how to use the invisible bands to improve the visual quality of text-based documents corrupted with undesired artifacts such as ink-bleed, ink-corrosion, and foxing. For documents of line drawings that suffer from low contrast, we use details found in the invisible bands to enhance legibility. The key components of our framework involve detecting regions in the document that can be enhanced by the NIR spectra, compositing the enhanced gradient map using the NIR bands, and reconstructing the final image from the composited gradients. This work is part of a collaborative effort with the Nationaal Archief of the Netherlands (NAN) and Art Innovation, a manufacturer of hyperspectral imaging hardware designed specially for historical documents. Our approach is evaluated on historical documents from NAN that exhibit degradations common to documents found in most archives and libraries.

Introduction

Hyperspectral imaging (HSI) captures a densely sampled spectral response of a scene object over a broad spectrum including invisible spectra such as ultra-violet (UV) and near-infrared (NIR). Hyperspectral imaging has been employed in various scientific disciplines to provide valuable data for fields such as astronomy [1], [2], earth science and remote sensing [3], [4], and computer vision [5]. With the advances in technology and cost reductions, hyperspectral imaging of historical art works and documents is now accessible for use in national libraries and archives [6], [7].

One advantage of HSI in document imaging over the standard 3-channel imaging (i.e. RGB) is that HSI provides a detailed quantitative measurements of the document spectral response. Traditional RGB imaging, on the other hand, contains only a subset of the information available by combining response of all visible electro-magnetic (EM) radiation into three bands. This makes HSI more suitable for tasks that require accurate quantitative measurement such as conservation, detecting damage, and analysis of features in the document (e.g. ink and pigments) and changes over time due to aging or light exposure. In addition, hyperspectral imaging provides measurements in the invisible spectrums (NIR, UV) which further enrich the analysis and enhancement of the data. Measurements in the invisible spectral bands provide more information about the document being imaged by sometimes seeing more than the visible range and by sometimes seeing less. This is demonstrated by two examples in Fig. 1. For the first example, the NIR band at 900 nm (Fig. 1(b)) provides more salient gradient details than the document in the visible band at 500 nm (Fig. 1(a)). Conversely, for the second example, the NIR band at 800 nm (Fig. 1(d)) is better for guiding enhancement than the 450 nm visible band (Fig. 1(c)) since artifacts such as ink-bleed and ink-corrosion are less prevalent.

The goal of this paper is to take advantage of hyperspectral images of historical documents to visually enhance the document's content by exploiting additional information provided by the NIR bands. The visual enhancement in this paper is applied to the RGB image of the hyperspectral data as the RGB image is the most natural visualization of the data. In this work, we are interested in two tasks. For the text-based documents that are corrupted with artifacts such as ink-bleed, corrosion, and foxing, we use the invisible bands which capture much less artifacts than the visible bands to clean up the artifacts in the documents while preserving the look and the feel of the original document. For drawing-based documents that contain low contrast regions, we use NIR bands which capture more details than the visible bands to enhance the contrast in the documents. The data are enhanced in the gradient domain which has been shown to be effective for many computer vision tasks such as image editing [8], contrast adjustment [9], image stitching [10], and intrinsic image computation [11]. The key components of our algorithm include detecting regions that can be enhanced by the additional NIR spectral images, compositing the enhanced gradient map from NIR images, and reconstructing the final image from gradients using an optimization scheme.

This work is a part of ongoing collaborative effort with the Nationaal Archief of the Netherlands (NAN), one of Europe's leading research archives, and Art Innovation, a manufacturer of hyperspectral imaging hardware designed for historical documents. The documents processed in this paper, which are indicative to the type of artifacts common to historical documents, are imaged at the NAN using the SEPIA Quantitative Hyper-Spectral Imager (QHSI) device developed by Art Innovation [12]. The device performs hyperspectral imaging by capturing a very narrow spectral band of EM radiation one at a time by placing a bandpass filter in front of the light source to block out all but a selected band of the EM spectrum. A monochromatic camera is then used to capture the amount of light that is reflected by the document at that selected band. The filter is changed for each image, thus capturing different parts of EM spectrum to build up the HSI (Fig. 2).

The QSHI device captures images at different wavelength bands from 365 (UV) to 1100 nm (near-infrared (NIR)) with the step size of 10 nm in most cases except the bands in 300 and 1000 nm's. The images have the resolution of 4 mega pixels (2048×2048) for a physical surface area of 125 mm ×125 mm and are captured at 16 bit per pixel. Such high-resolution (approximately 256 pixels per mm²) provides a reliable spatial measurement suitable for even thin lines of handwriting and printed text.

The remainder of the paper is organized as follows: we begin by reviewing related work in Section 2. In Section 3, we introduce our algorithm for visually enhancing old documents using the hyperspectral data. We show experimental results in Section 4 and conclude with a discussion about our algorithm and future work.

Section snippets

Related work

As hyperspectral imaging is a relatively new procedure in libraries and archives, less existing work is available in the context of document processing. Here related works are discussed in the areas of document processing and image fusion, with more emphasis placed on image fusion given the larger body of relevant work.

HSI document enhancement algorithm

As mentioned earlier, there are two types of enhancement that are targeted in this paper. With text documents, our algorithm aims to remove the undesired artifacts, notably ink-bleed, ink-corrosion, and foxing (age related spots). The final results are enhanced documents that still maintain the look of the original with the undesired artifacts significantly reduced. For this task, the images in the invisible range provide the source for the background of the enhanced image since invisible range

Experiments

The first set of experiments target the removal of artifacts on the documents. The document in the first example (Fig. 6) is visually corrupted with foxing. The result of our enhancement algorithm is shown in Fig. 6(b). The foxing artifact is greatly reduced in the enhanced image while the texture and the look of the original image is preserved. The ability to preserve the look and feel is one significant advantage offered by having the additional gradients in the NIR information. In related

Discussion

We have described how to take advantage of hyperspectral imaging, most notably using images in near-infrared to assist in visually enhancing old documents. Specifically, we demonstrated how to improve the visual quality of text-based documents corrupted with artifacts such as ink-bleed, ink-corrosion, and foxing, by using the invisible bands to help remove these undesired artifacts. For documents with line drawings that suffer from low contrast, we use the invisible bands to provide more

Acknowledgments

We gratefully acknowledge the support and efforts from our collaborators Roberto Padoan from the Nationaal Archief of the Netherlands (NAN) and Marvin Klein from Art Innovation. This work was supported in part by the NUS Young Investigator Award, R-252-000-379-101.

References (34)

Q. Du et al.
A linear constrained distance-based discriminant analysis for hyperspectral image classification
Pattern Recognition
(2001)
Y. Huang et al.
A framework for reducing ink-bleed in old documents
L. Zhang et al.
A unified framework for document restoration using inpainting and shape-from-shading
Pattern Recognition
(2009)
C.H. Chen et al.
Statistical pattern recognition in remote sensing
Pattern Recognition
(2008)
H. Li, C.-W. Fu, A. J. Hanson, Visualizing multiwavelength astrophysical data, in: IEEE Transactions on Visualization...
C. Collet et al.
Markov model for multispectral image analysis: application to small magellanic cloud segmentation
M.A. Loghmari et al.
A spectral and spatial source separation of multispectral images
IEEE Transactions on Geoscience and Remote Sensing
(2006)
Z. Pan et al.
Face recognition in hyperspectral images
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2003)
R. Padoan et al.
Quantitative hyperspectral imaging of historical documents: technique and applications
P. Cotte et al.
Spectral imaging of leonardo da vinci mona lisa: an authentic smile at 1523 dpi with additional infrared dat

P. Pe˜rez, M. Gangnet, A. Blake, Poisson image editing, in: ACM Transactions on Graphics (Proceedings of SIGGRAPH),...

R. Fattal, D. Lischinski, M. Werman, Gradient domain high dynamic range compression, in: ACM Transactions on Graphics...

A. Levin et al.

Seamless image stitching in the gradient domain

Y. Weiss

Deriving intrinsic images from image sequences

M. Klein et al.

Quantitative hyperspectral reflectance imaging

Sensors

(2008)

Z. Shi et al.

Historical document image enhancement using background light intensity normalization

C.L. Tan et al.

Restoration of archival documents using a wavelet technique

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

Cited by (104)

Integrated microprofilometry and multispectral imaging for full-field analysis of ancient manuscripts
2024, Journal of Cultural Heritage
A novel workflow is proposed to combine the use of two powerful techniques in the study of ancient manuscripts: multispectral imaging and optical microprofilometry. Multispectral imaging is routinely used and allows to examine each individual folium as a superposition of layers that give different responses in the UV-Vis-NIR bands. It enables the analysis of the conservation state of an object, the mapping of previous restorations or the detection of writings no longer visible. The downside of this technique is the lack of quantitative data on surface morphology. On the other hand, surface microprofilometry on book heritage is unexplored. The optical scanning microprofilometer used in this work employs single-point, interferometric depth-sensors that enable to measure the surface topography of the manuscript (deformation and roughness) in full-field (areas of tens of centimeters) at micrometer scale. The crucial task of spatial referencing the surface topography at micrometer scale to the visible features (e.g., written text) is performed with a novel procedure that solves the problem of the lack of reference points in the microprofilometer height data. We exploit the raw intensity signal collected by the laser depth sensor to fuse the interferometric measurements with the multispectral image stack. The full-field integration of quantitative microsurface measurements and in-band imaging responses enables a more comprehensive exploration of ancient manuscripts, by integrating materials-surface analysis, advancing the diagnostic protocol.
Spectral heat aging model to estimate the age of seals on painting and calligraphy
2020, Journal of Cultural Heritage
Seals, a common part of Chinese painting and calligraphy, play a key role in identification, collection, and other activities of painting and calligraphy. However, most of the seal age-analysis methods are still dependent on the experience and judgement of the experts. Thus, scientific evidence in this field is required. This study aims to explore the changes in the reflectance spectra by heat-aging on seal samples to establish a spectral heat-aging model that can be used to estimate the aging times of seals.
An experiment was conducted to detect the reflectance spectral changes of sample seals on Chinese paintings and calligraphy after different heat aging times, and a model incorporating the spectral changes over aging time was built. After heat aging, the visible near-infrared spectra of the seal samples were acquired and analyzed. The results show that the cinnabar pigment contained in the samples exhibited no significant change in spectra. The spectral change bands of the rice paper and the seal-ink did not overlap, and the spectral changes of the seal-ink aging were found to be mainly come from the additive. Its sensitive bands that means those bands with distinct characteristics of reflectance spectra are concentrated in the spectral ranges of 1700−1800 nm and 2250−2400 nm. As the aging time prolonged, the additives of the seals were found to gradually undergo changes, whereas, the composition of the cinnabar pigment tended to be stable.
Finally, a spectral heat-aging model was established using multivariable linear regression. The accuracy of this model was tested using 16 samples that were not used during training. The coefficient of determination (R²) value was found to be 0.83, and the correlation between the real and estimated aging times was 0.65. Results demonstrated that the visible near-infrared reflectance spectra changes could be used to estimate the age of seals on Chinese paintings or calligraphy.
Quantifying Pigment Features of Thangka Five Buddha Using Hyperspectral Imaging
2024, SSRN
IDTRUST: DEEP IDENTITY DOCUMENT QUALITY DETECTION WITH BANDPASS FILTERING
2024, arXiv
New Frontiers in the Digital Restoration of Hidden Texts in Manuscripts: A Review of the Technical Approaches
2024, Heritage
Analysis of Hyperspectral Data to Develop an Approach for Document Images
2023, Sensors

View all citing articles on Scopus

Seon Joo Kim received B.S. and M.S. degrees from Yonsei University, Seoul, Korea, in 1997 and 2001. He received Ph.D. degree in computer science from University of North Carolina at Chapel Hill in 2008. He is currently a research fellow in National University of Singapore. He has worked as an intern at Cortex and GE Global Research Center during the summers of 2004 and 2005, respectively. His research interests include computer vision, image/video analysis, and computational photography where he has published in major computer vision conferences and journals. He received the Ministry of Information and Communication Scholarship (Republic of Korea) in 2002 and the Graduate School Dean Scholarship (Yonsei University) in 1999.

Fanbo Deng received his B.S. in Computer Science from the Harbin Institute of Technology, China, in 2008. He is currently a Ph.D. student at the National University of Singapore. His research focus is on Computer Vision, Image Processing, and Visualization.

Michael S. Brown obtained his B.S. and Ph.D. in Computer Science from the University of Kentucky in 1995 and 2001, respectively. He was a visiting Ph.D. student at the University of North Carolina at Chapel Hill from 1998–2000. He is currently an Associate Professor in the School of Computing at the National University of Singapore. Dr. Brown regularly serves on the program committees for the major Computer Vision conferences (ICCV, CVPR, and ECCV) and as an Area Chair for CVPR. He served as the general co-chair for the 5th Projector-Camera-Systems (PROCAMS’08) workshop co-located with SIGGRAPH’08 and was an organizer for the 1st eHeritage’09 workshop co-located with ICCV’09. His research interests include Computer Vision, Image Processing and Computer Graphics.

View full text

Visual enhancement of old documents with hyperspectral imaging

Abstract

Introduction

Section snippets

Related work

HSI document enhancement algorithm

Experiments

Discussion

Acknowledgments

Pattern Recognition

Pattern Recognition

Pattern Recognition

Markov model for multispectral image analysis: application to small magellanic cloud segmentation

A spectral and spatial source separation of multispectral images

IEEE Transactions on Geoscience and Remote Sensing

Face recognition in hyperspectral images

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantitative hyperspectral imaging of historical documents: technique and applications

Spectral imaging of leonardo da vinci mona lisa: an authentic smile at 1523 dpi with additional infrared dat

Seamless image stitching in the gradient domain

Deriving intrinsic images from image sequences

Quantitative hyperspectral reflectance imaging

Sensors

Historical document image enhancement using background light intensity normalization

Restoration of archival documents using a wavelet technique

IEEE Transactions on Pattern Analysis and Machine Intelligence