Elsevier

Neurocomputing

Volume 72, Issues 1–3, December 2008, Pages 57-70
Neurocomputing

Wavelet-based separation of nonlinear show-through and bleed-through image mixtures

https://doi.org/10.1016/j.neucom.2007.12.048Get rights and content

Abstract

This work addresses the separation of the nonlinear real-life mixture of images that occurs when a page of a document is scanned or photographed and the back page shows through. This effect can be due to partial paper transparency (show-through) and/or to bleeding of the ink through the paper (bleed-through). These two causes usually lead to mixtures with different characteristics.

We propose a separation method based on the fact that the high-frequency components of the images are sparse and are stronger on one side of the paper than on the other one. The same properties were already used in nonlinear denoising source separation (DSS). However, we developed significant improvements that allow us to achieve a competitive separation quality by means of a one-shot processing, with no iteration. The method does not require the sources to be independent or the mixture to be invariant, and is suitable for separating mixtures such as those produced by bleed-through, for which we do not have an adequate physical model.

Introduction

This paper focuses on the separation of two-image mixtures that occur in a well known practical situation: when we scan or photograph a document and the back page shows through. This effect is often due to partial transparency of the paper (which we designate by show-through). Another possible cause is bleeding of ink through the paper, a phenomenon that is more common in old documents, in which the ink has had more time to bleed. The latter phenomenon is commonly designated by bleed-through. The two phenomena may be simultaneously present in the same document.

In this work we use, as test examples, three different kinds of mixtures. The first kind essentially only contains the show-through effect: five pairs of images were printed on the two sides of five sheets of tracing paper1 which, due to its high transparency, creates very strong mixtures. The second type corresponds to an old manuscript letter written in very thin “air-mail” paper (also called onion skin paper, which is rather transparent, causing show-through to occur). This document also has some areas in which bleed-through appears to have occurred. The third kind of mixture corresponds to images of old manual transcripts of music (partitures), which mostly contain the bleed-through effect. For each document, scanning or photographing both sides allowed us to obtain two different mixtures of the contents of the two pages. In this paper we address the source separation problem whose aim is to recover, form the two acquired images of each document, the original page images.

Show-through is known to lead to nonlinear mixtures [2], [5], [3]. A physical model of the show-through mixture of gray-level images printed with halftoning has been presented in [3]. Bleed-through probably is a much more complex phenomenon, which is much harder to model.

Source separation is often performed by assuming that the sources are statistically independent from each other, an assumption which leads to the use of independent component analysis (ICA) techniques. While linear ICA is a well studied problem for which several efficient solutions exist [6], [7], [27], nonlinear ICA is a much less studied problem [1], [13], [10], [8]. Nonlinear ICA has the additional difficulty of being ill-posed, having an infinite number of solutions without any simple relationship with one another [14], [11]. The mixtures addressed in this work are nonlinear and noisy, and the letter and partiture mixtures are spatially variant. Besides these challenging properties, most of the sources studied in this work do not completely obey the independence assumption, a fact which affects the quality of the results obtained through ICA-based methods [2], [5].

Instead of assuming independence of the source images, we propose a solution that uses other properties of images and of the mixture process. We use the well known fact that high-frequency components of images are sparse (and that high-frequency wavelet coefficients are also sparse), and we formulate a competition based on the observation that each source is more strongly represented in one of the mixture components than in the other one. Making assumptions that are suited to the present problem, our method achieves a good perceptual separation quality even when the sources are non-independent and the mixture is spatially variant. The separation method that we propose is similar to the denoising step used by nonlinear denoising source separation (DSS) [5]. However, we use an improved form of competition, and also a wavelet transform that is more suited to the problem at hand. These improvements lead to a method that performs the separation in a single step, without the iterative procedure required by nonlinear DSS.

Both the old manuscript letter and the old partitures with the bleed-through effect are addressed for the first time in this paper. On the other hand, the tracing paper mixtures have already been studied in other works [2], [5], [3]. Contrasting with the method proposed here, which, due to its use of wavelets, performs a non-point-wise transformation, all the other mentioned methods performed point-wise separation. One of them [2] used the MISEP method of nonlinear ICA [1] to train a regularized MLP which performed the separation. In another one [3], MISEP was used to train a nonlinear physical model of the mixture process. Nonlinear DSS has also been applied to some of the tracing paper mixtures [5]. Nonlinear DSS does not assume independence of the sources, but assumes spatial invariance of the mixture. It uses the same basic ideas that are used in this paper, albeit in a less efficient manner. Show-through and/or bleed-through mixtures have also been addressed in [19], [22], [23], [24], [9], but in different settings from the one considered here. In [19], [24] separation is archived through linear models, which were shown to be too restrictive to separate tracing paper mixtures [2]. Refs. [22], [9] focus only on the restoration of text documents, for which linear separation yields relatively good results (see [2]). In [23], the contents of both pages are assumed to consist of text, and separation is linear and is based on a single color image from one side of the document.

This manuscript is structured as follows: Section 2 describes the three kinds of mixtures that were studied, as well as the processes of image acquisition and alignment. Section 3 describes the proposed separation method. Section 4 presents experimental results, and Section 5 concludes.

The Matlab separation routines and the images used in this work are available at http://www.lx.it.pt/∼mscla/. The routines for performing image alignment are available at http://www.lx.it.pt/∼lbalmeida/ica/seethrough/.

Section snippets

Mixtures and acquisition

The method proposed in this work was applied to three kinds of image mixtures. Although they have completely different origins, all the three mixture processes are from real life (i.e., they are not synthetic), and are noisy and significantly nonlinear. Some of them also are spatially variant.

  • Tracing paper images: Five different pairs of images (including synthetic bars, photos and text) were used as sources of five pairs of mixtures. These pairs of sources, shown in Fig. 1, Fig. 2, were

Separation method

Instead of assuming independence of the source images, the method that we propose uses a property of common images and a property of the mixture process to perform the separation. These properties are:

  • (1)

    High-frequency components of common images are sparse. This translates into the fact that high-frequency wavelet coefficients have sparse distributions [16]. Consequently, the high-frequency wavelet coefficients from two different source images will seldom both have significant values in the same

Experimental results

In this section we present the experimental results obtained with the proposed method, and a brief comparison with results from other methods. Due to space limitations, the images are shown much smaller than real size. In the electronic version of this paper it is possible to zoom in on the images to better examine their details.

The separation method presented in Section 3 was applied to the three mixture sets that were described in Section 2. For all experiments, the value of the competition

Conclusions

A non-iterative method for separating real-life nonlinear mixtures of images was presented. The method is fast and yields images with a perceptual separation quality that is competitive with the one obtained with previous methods.

The proposed method does not assume independence of the sources, but uses other properties of the problem. Therefore the quality of the results is not affected by the possible non-independence of the source images. Since the method processes wavelet coefficients down

Acknowledgments

We wish to acknowledge our colleague Rogério C. Pinto for kindly making available the old letter and the partiture images. We also acknowledge the use of the free package available at http://taco.poly.edu/WaveletSoftware/ for computing complex wavelet transforms. This work was partially supported by the Portuguese FCT and by the “Programa Operacional Sociedade do Conhecimento (POS-Conhecimento)”, comparticipated by the FEDER European Community fund, under the Project POSC/EEA-CPS/61271/2004 and

Mariana S.C. Almeida was born in Lisbon, Portugal, in May 30, 1982. She received the engineering degree in electrical and computer engineering from the Instituto Superior Técnico (I.S.T., the Engineering School of the Technical University of Lisbon), Lisbon, Portugal, in 2005. Since 2006 she is working on her PhD in the Institute of Telecommunications, Lisbon, under the supervision of Prof. Luís B. Almeida.

References (27)

  • A. Hyvärinen et al.

    Nonlinear independent component analysis: existence and uniqueness results

    Neural Networks

    (1999)
  • F.A.A. Kingdom et al.

    Sensitivity to contrast histogram differences in synthetic wavelet-textures

    Vision Res.

    (2001)
  • L.B. Almeida

    MISEP—linear and nonlinear ICA based on mutual information

    J. Mach. Learn. Res.

    (2003)
  • L.B. Almeida

    Separating a real-life nonlinear image mixture

    J. Mach. Learn. Res.

    (2005)
  • M.S.C. Almeida, L.B. Almeida, Separating nonlinear image mixtures using a physical model trained with ica, in: IEEE...
  • M.S.C. Almeida, L.B. Almeida, Wavelet based nonlinear separation of images, in: IEEE International Workshop on Machine...
  • M.S.C. Almeida, H. Valpola, J. Särelä, Separation of nonlinear image mixtures by denoising source separation, in: J....
  • A. Bell et al.

    An information-maximization approach to blind separation and blind deconvolution

    Neural Comput.

    (1995)
  • A. Belouchrani et al.

    A blind source separation technique based on second order statistics

    IEE Trans. Signal Process.

    (1997)
  • T. Blaschke, L. Wiskott, Independent slow feature analysis and nonlinear blind source separation, in: Proceedings of...
  • E. Dubois, A. Pathak, Reduction of bleed-through in scanned manuscript documents, in: Proceedings of the IS&T Image...
  • S. Harmeling, A. Ziehe, M. Kawanabe, B. Blankertz, K.-R. Müller, Nonlinear blind source separation using kernel feature...
  • H. Lappalainen et al.

    Bayesian nonlinear independent component analysis by multi-layer perceptrons

  • Cited by (11)

    • Nonlinear separation of show-through image mixtures using a physical model trained with ICA

      2012, Signal Processing
      Citation Excerpt :

      Although the letter was not printed, and therefore our model, based on the halftoning process, was not strictly valid in this case, it still yielded good results (actually, show-through has been empirically modeled by other authors, in cases that do not involve halftoning, as what turn out to be sub-models of our bi-affine model [27–30]). The images of the air-mail letter, which had already been used in [19], present a strong show-through mixture, possibly also with some bleed-through. A segment of one of the mixture images and of the corresponding separated images shown in Fig. 9.

    • Homomorphic technique for image separation

      2024, Multimedia Tools and Applications
    • Blind image separation for document restoration using plug-and-play approach

      2021, IEEE 23rd International Workshop on Multimedia Signal Processing, MMSP 2021
    • Blind image separation using pyramid technique

      2018, Eurasip Journal on Image and Video Processing
    • Moving shadow detection algorithm based on contrast contourlet

      2014, Journal of Computational Information Systems
    • Nonlinear model identification and see-through cancelation from recto-verso data

      2013, International Journal on Document Analysis and Recognition
    View all citing articles on Scopus

    Mariana S.C. Almeida was born in Lisbon, Portugal, in May 30, 1982. She received the engineering degree in electrical and computer engineering from the Instituto Superior Técnico (I.S.T., the Engineering School of the Technical University of Lisbon), Lisbon, Portugal, in 2005. Since 2006 she is working on her PhD in the Institute of Telecommunications, Lisbon, under the supervision of Prof. Luís B. Almeida.

    Luís B. Almeida was born in Lisbon, Portugal, in April 15, 1950. He graduated in Electrical Engineering by the Instituto Superior Técnico, Lisbon, in 1972, and obtained a “Doutor” degree by the Technical University of Lisbon, in 1983, with a thesis on nonstationary modelling of speech. Since 1972 he has been with the Instituto Superior Técnico, where he is, since 1995, a full professor in the areas of signal processing and machine learning. From 1984 to 2004 he was head of the Neural Networks and Signal Processing Group of INESC-ID. In the years 2000–2003 he was president of INESC-ID. In 2005 he joined the Instituto de Telecomunicações (Telecommunications Institute). He is the author of many papers on speech modelling and coding, time-frequency representations of signals and the fractional Fourier transform, learning algorithms for neural networks, and independent component analysis/source separation. Currently his research focuses mainly on nonlinear source separation and, more generally, on unsupervised or semi-supervised learning of structure from data. He was the recipient of an IEEE Signal Processing Area ASSP Senior Award and of several national awards.

    View full text