Wavelet-based separation of nonlinear show-through and bleed-through image mixtures
Introduction
This paper focuses on the separation of two-image mixtures that occur in a well known practical situation: when we scan or photograph a document and the back page shows through. This effect is often due to partial transparency of the paper (which we designate by show-through). Another possible cause is bleeding of ink through the paper, a phenomenon that is more common in old documents, in which the ink has had more time to bleed. The latter phenomenon is commonly designated by bleed-through. The two phenomena may be simultaneously present in the same document.
In this work we use, as test examples, three different kinds of mixtures. The first kind essentially only contains the show-through effect: five pairs of images were printed on the two sides of five sheets of tracing paper1 which, due to its high transparency, creates very strong mixtures. The second type corresponds to an old manuscript letter written in very thin “air-mail” paper (also called onion skin paper, which is rather transparent, causing show-through to occur). This document also has some areas in which bleed-through appears to have occurred. The third kind of mixture corresponds to images of old manual transcripts of music (partitures), which mostly contain the bleed-through effect. For each document, scanning or photographing both sides allowed us to obtain two different mixtures of the contents of the two pages. In this paper we address the source separation problem whose aim is to recover, form the two acquired images of each document, the original page images.
Show-through is known to lead to nonlinear mixtures [2], [5], [3]. A physical model of the show-through mixture of gray-level images printed with halftoning has been presented in [3]. Bleed-through probably is a much more complex phenomenon, which is much harder to model.
Source separation is often performed by assuming that the sources are statistically independent from each other, an assumption which leads to the use of independent component analysis (ICA) techniques. While linear ICA is a well studied problem for which several efficient solutions exist [6], [7], [27], nonlinear ICA is a much less studied problem [1], [13], [10], [8]. Nonlinear ICA has the additional difficulty of being ill-posed, having an infinite number of solutions without any simple relationship with one another [14], [11]. The mixtures addressed in this work are nonlinear and noisy, and the letter and partiture mixtures are spatially variant. Besides these challenging properties, most of the sources studied in this work do not completely obey the independence assumption, a fact which affects the quality of the results obtained through ICA-based methods [2], [5].
Instead of assuming independence of the source images, we propose a solution that uses other properties of images and of the mixture process. We use the well known fact that high-frequency components of images are sparse (and that high-frequency wavelet coefficients are also sparse), and we formulate a competition based on the observation that each source is more strongly represented in one of the mixture components than in the other one. Making assumptions that are suited to the present problem, our method achieves a good perceptual separation quality even when the sources are non-independent and the mixture is spatially variant. The separation method that we propose is similar to the denoising step used by nonlinear denoising source separation (DSS) [5]. However, we use an improved form of competition, and also a wavelet transform that is more suited to the problem at hand. These improvements lead to a method that performs the separation in a single step, without the iterative procedure required by nonlinear DSS.
Both the old manuscript letter and the old partitures with the bleed-through effect are addressed for the first time in this paper. On the other hand, the tracing paper mixtures have already been studied in other works [2], [5], [3]. Contrasting with the method proposed here, which, due to its use of wavelets, performs a non-point-wise transformation, all the other mentioned methods performed point-wise separation. One of them [2] used the MISEP method of nonlinear ICA [1] to train a regularized MLP which performed the separation. In another one [3], MISEP was used to train a nonlinear physical model of the mixture process. Nonlinear DSS has also been applied to some of the tracing paper mixtures [5]. Nonlinear DSS does not assume independence of the sources, but assumes spatial invariance of the mixture. It uses the same basic ideas that are used in this paper, albeit in a less efficient manner. Show-through and/or bleed-through mixtures have also been addressed in [19], [22], [23], [24], [9], but in different settings from the one considered here. In [19], [24] separation is archived through linear models, which were shown to be too restrictive to separate tracing paper mixtures [2]. Refs. [22], [9] focus only on the restoration of text documents, for which linear separation yields relatively good results (see [2]). In [23], the contents of both pages are assumed to consist of text, and separation is linear and is based on a single color image from one side of the document.
This manuscript is structured as follows: Section 2 describes the three kinds of mixtures that were studied, as well as the processes of image acquisition and alignment. Section 3 describes the proposed separation method. Section 4 presents experimental results, and Section 5 concludes.
The Matlab separation routines and the images used in this work are available at http://www.lx.it.pt/∼mscla/. The routines for performing image alignment are available at http://www.lx.it.pt/∼lbalmeida/ica/seethrough/.
Section snippets
Mixtures and acquisition
The method proposed in this work was applied to three kinds of image mixtures. Although they have completely different origins, all the three mixture processes are from real life (i.e., they are not synthetic), and are noisy and significantly nonlinear. Some of them also are spatially variant.
- •
Tracing paper images: Five different pairs of images (including synthetic bars, photos and text) were used as sources of five pairs of mixtures. These pairs of sources, shown in Fig. 1, Fig. 2, were
Separation method
Instead of assuming independence of the source images, the method that we propose uses a property of common images and a property of the mixture process to perform the separation. These properties are:
- (1)
High-frequency components of common images are sparse. This translates into the fact that high-frequency wavelet coefficients have sparse distributions [16]. Consequently, the high-frequency wavelet coefficients from two different source images will seldom both have significant values in the same
Experimental results
In this section we present the experimental results obtained with the proposed method, and a brief comparison with results from other methods. Due to space limitations, the images are shown much smaller than real size. In the electronic version of this paper it is possible to zoom in on the images to better examine their details.
The separation method presented in Section 3 was applied to the three mixture sets that were described in Section 2. For all experiments, the value of the competition
Conclusions
A non-iterative method for separating real-life nonlinear mixtures of images was presented. The method is fast and yields images with a perceptual separation quality that is competitive with the one obtained with previous methods.
The proposed method does not assume independence of the sources, but uses other properties of the problem. Therefore the quality of the results is not affected by the possible non-independence of the source images. Since the method processes wavelet coefficients down
Acknowledgments
We wish to acknowledge our colleague Rogério C. Pinto for kindly making available the old letter and the partiture images. We also acknowledge the use of the free package available at http://taco.poly.edu/WaveletSoftware/ for computing complex wavelet transforms. This work was partially supported by the Portuguese FCT and by the “Programa Operacional Sociedade do Conhecimento (POS-Conhecimento)”, comparticipated by the FEDER European Community fund, under the Project POSC/EEA-CPS/61271/2004 and
Mariana S.C. Almeida was born in Lisbon, Portugal, in May 30, 1982. She received the engineering degree in electrical and computer engineering from the Instituto Superior Técnico (I.S.T., the Engineering School of the Technical University of Lisbon), Lisbon, Portugal, in 2005. Since 2006 she is working on her PhD in the Institute of Telecommunications, Lisbon, under the supervision of Prof. Luís B. Almeida.
References (27)
- et al.
Nonlinear independent component analysis: existence and uniqueness results
Neural Networks
(1999) - et al.
Sensitivity to contrast histogram differences in synthetic wavelet-textures
Vision Res.
(2001) MISEP—linear and nonlinear ICA based on mutual information
J. Mach. Learn. Res.
(2003)Separating a real-life nonlinear image mixture
J. Mach. Learn. Res.
(2005)- M.S.C. Almeida, L.B. Almeida, Separating nonlinear image mixtures using a physical model trained with ica, in: IEEE...
- M.S.C. Almeida, L.B. Almeida, Wavelet based nonlinear separation of images, in: IEEE International Workshop on Machine...
- M.S.C. Almeida, H. Valpola, J. Särelä, Separation of nonlinear image mixtures by denoising source separation, in: J....
- et al.
An information-maximization approach to blind separation and blind deconvolution
Neural Comput.
(1995) - et al.
A blind source separation technique based on second order statistics
IEE Trans. Signal Process.
(1997) - T. Blaschke, L. Wiskott, Independent slow feature analysis and nonlinear blind source separation, in: Proceedings of...
Bayesian nonlinear independent component analysis by multi-layer perceptrons
Cited by (11)
Nonlinear separation of show-through image mixtures using a physical model trained with ICA
2012, Signal ProcessingCitation Excerpt :Although the letter was not printed, and therefore our model, based on the halftoning process, was not strictly valid in this case, it still yielded good results (actually, show-through has been empirically modeled by other authors, in cases that do not involve halftoning, as what turn out to be sub-models of our bi-affine model [27–30]). The images of the air-mail letter, which had already been used in [19], present a strong show-through mixture, possibly also with some bleed-through. A segment of one of the mixture images and of the corresponding separated images shown in Fig. 9.
Homomorphic technique for image separation
2024, Multimedia Tools and ApplicationsBlind image separation for document restoration using plug-and-play approach
2021, IEEE 23rd International Workshop on Multimedia Signal Processing, MMSP 2021Blind image separation using pyramid technique
2018, Eurasip Journal on Image and Video ProcessingMoving shadow detection algorithm based on contrast contourlet
2014, Journal of Computational Information SystemsNonlinear model identification and see-through cancelation from recto-verso data
2013, International Journal on Document Analysis and Recognition
Mariana S.C. Almeida was born in Lisbon, Portugal, in May 30, 1982. She received the engineering degree in electrical and computer engineering from the Instituto Superior Técnico (I.S.T., the Engineering School of the Technical University of Lisbon), Lisbon, Portugal, in 2005. Since 2006 she is working on her PhD in the Institute of Telecommunications, Lisbon, under the supervision of Prof. Luís B. Almeida.
Luís B. Almeida was born in Lisbon, Portugal, in April 15, 1950. He graduated in Electrical Engineering by the Instituto Superior Técnico, Lisbon, in 1972, and obtained a “Doutor” degree by the Technical University of Lisbon, in 1983, with a thesis on nonstationary modelling of speech. Since 1972 he has been with the Instituto Superior Técnico, where he is, since 1995, a full professor in the areas of signal processing and machine learning. From 1984 to 2004 he was head of the Neural Networks and Signal Processing Group of INESC-ID. In the years 2000–2003 he was president of INESC-ID. In 2005 he joined the Instituto de Telecomunicações (Telecommunications Institute). He is the author of many papers on speech modelling and coding, time-frequency representations of signals and the fractional Fourier transform, learning algorithms for neural networks, and independent component analysis/source separation. Currently his research focuses mainly on nonlinear source separation and, more generally, on unsupervised or semi-supervised learning of structure from data. He was the recipient of an IEEE Signal Processing Area ASSP Senior Award and of several national awards.