A Deep Boltzmann Machine-Based Approach for Robust Image Denoising

Pires, Rafael G.; Santos, Daniel S.; Souza, Gustavo B.; Marana, Aparecido N.; Levada, Alexandre L. M.; Papa, João Paulo

doi:10.1007/978-3-319-75193-1_63

A Deep Boltzmann Machine-Based Approach for Robust Image Denoising

Conference paper
First Online: 04 February 2018

2106 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10657))

Abstract

A Deep Boltzmann Machine (DBM) is composed of a stack of learners called Restricted Boltzmann Machines (RBMs), which correspond to a specific kind of stochastic energy-based networks. In this work, a DBM is applied to a robust image denoising by minimizing the contribution of some of its top nodes, called “noise nodes”, which often get excited when noise pixels are present in the given images. After training the DBM with noise and clean images, the detection and deactivation of the noise nodes allow reconstructing images with great quality, eliminating most of their noise. The results obtained from important public image datasets showed the validity of the proposed approach.

J. P. Papa—The authors are grateful to Capes, CNPq grant #306166/2014-3, and FAPESP grants #2014/16250-9, #2014/12236-1, and #2016/19403-6.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

During the image acquisition process, some level of noise is usually added to the real data mainly due to physical limitations of the acquisition sensor, and also regarding imprecisions during the data transmission and manipulation. Therefore, the resultant image needs to be processed in order to attenuate its noise without losing details present at high frequencies areas, being the field of image processing that addresses such issue called “image restoration”. In this context, some well-known image restoration methods, such as inverse and Wiener filter, regularization, projection-based [1, 2] and Maximum a Posteriori probability techniques have been developed over the last decades [3]. Despite of machine learning being a well consolidated research field dating back to the 60’s, only in the last years it has been employed to address the problem of image restoration [4, 5].

Recently, deep learning techniques have been considered a page-turner due to the outstanding results in a number of computer vision-related problems, such as face and object recognition, just to name a few. In the last years, some works have addressed the problem of image restoration using such approaches, such as Keyvanrad et al. [6], which employed Deep Belief Networks (DBN) to smooth noise in images. Tang et al. [7] proposed the Robust Boltzmann Machine (RoBM), which allows Boltzmann Machines to be more robust to image corruptions. The model is trained in an unsupervised fashion with unlabeled noisy data, and can learn the spatial structure of the occluders. Compared to some standard algorithms, the model has been significantly better for denoising face images.

Xie et al. [8] used deep networks pre-trained with auto-encoders for image inpainting and denoising, and Tang et al. [9] employed Restricted Boltzmann Machines (RBMs) for the very same purpose of image denoising. Later on, Yan and Shao [10] used Deep Belief Networks [11] (DBNs) to identify blur type and parameters in natural images. Recently, a new RBM-based architecture was proposed, called Deep Boltzmann Machines (DBMs) [12]. This new approach have presented some great results in many areas outperforming the DBNs, since in the training phase of that network not only bottom-up information is considered, but also top-down influences. As a matter of fact, our approach is based on the work by Keyvanrad et al. [6], a new deep learning-based approach for robust image denoising using DBMs is proposed. We showed the proposed approach outperforms DBNs and standard DBMs in the context of image denoising.

2 Restricted Boltzmann Machines

Restricted Boltzmann Machines [13] are energy-based stochastic neural networks composed of two layers of neurons (visible and hidden), in which the learning phase is conducted by means of an unsupervised fashion. A naïve architecture of a Restricted Boltzmann Machine comprises a visible layer $\mathbf v $ with m units and a hidden layer $\mathbf h $ with n units. Additionally, a real-valued matrix $\mathbf W _{m\times n}$ models the weights between the visible and hidden neurons, where $w_{ij}$ stands for the weight between the visible unit $v_i$ and the hidden unit $h_j$.

At first, let us assume both $\mathbf v $ and $\mathbf h $ as being binary-valued units. In other words, $\mathbf v \in \{0,1\}^m$ e $\mathbf h \in \{0,1\}^n$, thus leading to the so-called Bernoulli-Bernoulli Restricted Boltzmann Machine, since both units follow a Bernoulli distribution. The energy function of an RBM is given by:

$$\begin{aligned} E(\mathbf v ,\mathbf h )=-\sum _{i=1}^ma_iv_i-\sum _{j=1}^nb_jh_j-\sum _{i=1}^m\sum _{j=1}^nv_ih_jw_{ij}, \end{aligned}$$

(1)

where $\mathbf a $ e $\mathbf b $ stand for the biases of visible and hidden units, respectively.

The probability of a joint configuration $(\mathbf v ,\mathbf h )$ is computed as follows:

$$\begin{aligned} P(\mathbf v ,\mathbf h )=\frac{1}{Z}e^{-E(\mathbf v ,\mathbf h )}, \end{aligned}$$

(2)

where Z stands for the so-called partition function, which is basically a normalization factor computed over all possible configurations involving the visible and hidden units. Similarly, the marginal probability of a visible (input) vector is given by:

$$\begin{aligned} P(\mathbf v )=\frac{1}{Z}\displaystyle \sum _\mathbf{h }e^{-E(\mathbf v ,\mathbf h )}. \end{aligned}$$

(3)

Since the RBM is a bipartite graph, the activations of both visible and hidden units are mutually independent, thus leading to the following conditional probabilities:

$$\begin{aligned} P(v_i=1|\mathbf h )=\phi \left( \sum _{j=1}^nw_{ij}h_j+a_i\right) , \end{aligned}$$

(4)

and

$$\begin{aligned} P(h_j=1|\mathbf v )=\phi \left( \sum _{i=1}^mw_{ij}v_i+b_j\right) . \end{aligned}$$

(5)

Note that $\phi (\cdot )$ stands for the sigmoid function.

Let $\varTheta = (W, a, b)$ be the set of parameters of a RBM, which can be learned though a training algorithm that aims at maximizing the product of probabilities given all the available training data $\mathcal{V}$, as follows:

$$\begin{aligned} \arg \max _{\varTheta }\prod _\mathbf{v \in \mathcal{V}}P(\mathbf v ). \end{aligned}$$

(6)

One of the most used approaches to solve the above problem is the Contrastive Divergence (CD) [13], which basically ends up performing Gibbs sampling using the training data as the visible units, instead of random inputs.

2.1 Deep Boltzmann Machines

Salakhutdinov and Hinton [12] presented the DBM, which aims at improving the inference during the learning process, since it now considers both directions of interaction among adjacent layers. Salakhutdinov and Hinton [14] proposed the use of a variable inference method called “Mean-Field” to enhance the DBM learning procedure. This technique approximates the posterior distributions inferred from the observed data of the estimates based on isolated network segments. The training process of a DBM consists in minimizing the total energy of the system according to the parameters found through partial inferences made through the mean-fields (MF).

Roughly speaking, the idea is to find an approximation $Q^{MF}(\mathbf h |\mathbf v ; \varvec{\mu })$ that best represents the true distribution of the hidden layers, i.e. $P(\mathbf h |\mathbf v )$. This approximation is computed through the following factored distribution:

$$\begin{aligned} Q^{MF}(\mathbf h |\mathbf v ; \varvec{\mu }) = \prod _{l=1}^L \left[ \prod _{k=1}^{F_l} q(h_k^l) \right] , \end{aligned}$$

(7)

where L stands for the number of hidden layers, $F_l$ represents the number of nodes in the hidden layer l, and $q(h_k^l=1)=\mu _k^l$. The goal is to find the parameters of the mean-field $\varvec{\mu } = \left\{ \varvec{\mu _1}, \varvec{\mu _2},\ldots , \varvec{\mu ^L} \right\} $ according to following equations:

$$\begin{aligned} \mu _k^1 = \phi \left( \sum _{i=1}^m w_{ik}^1v_i + \sum _{j=1}^{F_2}w_{kj}^2\mu _j^2 \right) , \end{aligned}$$

(8)

which represents the interaction between the first hidden layer and your previous visible layer where $\phi $ is a sigmoid function. Similarly, the interactions between hidden layers $l - 1$ and l are given as follows:

$$\begin{aligned} \mu _k^l = \phi \left( \sum _{i=1}^{F_{l-1}} w_{ik}^l\mu _i^{l-1} + \sum _{j=1}^{F_{l+1}}w_{kj}^{l+1}\mu _j^{l+1} \right) , \end{aligned}$$

(9)

where $w^l_{ij}$ stands for the weight between node i from hidden layer $l-1$ and node j from hidden layer l. Finally, the mean-field parameters for the hidden layer at the top of the DBM are calculated by:

$$\begin{aligned} \mu _k^L = \phi \left( \sum _{i=1}^{F_{L-1}} w_{ik}^L\mu _i^{L-1} \right) . \end{aligned}$$

(10)

3 Proposed Approach

The goal of this work is to propose a new DBM-based denoising approach, which learns how to deactivate some of its nodes in order to smooth the noise levels from images. After training the DBM with the clean (noise-free) and noisy training images together, we used a criterion called relative activity [6] ($\varvec{\psi }^*$), which is defined as the difference among the mean activation values of the upper most hidden nodes. For the sake of explanation, after training the DBM using the mean-field procedure, we take first the clean images and propagate them upwards. For each clean image, we store the activation field of the top layer in order to compute the mean activation field regarding all clean images, hereinafter called $\psi _{clean}$. Further, we conduct the very same procedure for the noisy images to estimate their mean activation field at the top layer^{Footnote 1}, hereinafter called $\psi _{noisy}$. Therefore, one has two mean activation fields at the very top layer: one for the clean and another for the noisy images. Thus, the aforementioned relative activity is computed as the difference between the mean activation fields of the clean and noisy images, i.e., $\varvec{\psi }^*= \left| \varvec{\psi }_{clean}-\varvec{\psi }_{noisy}\right| $.

Further, one needs to find out the so-called “noise nodes”, which stand for the nodes of the upper most hidden layer that are activated upon the presence of noisy structures, i.e., such nodes become more “excited” when they are presented to noisy elements. Basically, we thresholded the relative activity values as follows:

$$\begin{aligned} \psi ^*_i= & {} \left\{ \begin{array}{ll} \psi ^{clean}_i &{} \text{ if } \psi ^*_i > T \\ \psi ^*_i &{} \text{ otherwise, } \end{array}\right. \end{aligned}$$

(11)

where T stands for the threshold value^{Footnote 2}. Figure 1a illustrates the aforementioned process.

Finally, concerning the denoising step, given a noisy image, we perform a bottom-up pass until we reach the top layer (final inference). Then, the noisy nodes will be in charge of replacing their corresponding nodes of the noisy image. Since the noisy nodes were deactivated in the previous step, the reconstructed image (top-down step) is now much cleaner than before. This procedure is depicted in Fig. 1b.

4 Experiments and Discussion

We used the following image databases to evaluate the performance of the proposed approach: MNIST [15], Semeion [16], and Caltech 101 Silhouettes [17]. In this work, we employed a neural architecture composed of 3 layers containing 784-1000-500-250 nodes. Recall that such architecture has been empirically chosen^{Footnote 3} $^,$ ^{Footnote 4}.

The main idea is to train the network in a way that is possible to learn the mapping between clean and noisy images. The MNIST database contains 60, 000 training images, as well as 10, 000 testing images. In the experiments, we used a subset of 20, 000 images for training composed of 10, 000 noiseless images and their respective noisy versions (10, 000). The noisy images were generated by means of a Gaussian noise with zero mean and two different variance values: $\sigma \in \{0.1,0.2\}$, since we conducted two different experiments. In regard to the test phase (denoising), we used all the 10, 000 testing images corrupted with a zero-mean Gaussian noise and variance levels of $\sigma \in \{0.1,0.2\}$.

Since Semeion database contains 1, 400 training images only, we increased the training set size to 22, 400 as follows: we kept the original 1, 400 images, and further we generated 1, 400 more images that are noisy versions of the original ones. Once again, since we considered different noise levels for two separated experiments, the images were corrupted with a zero-mean Gaussian distribution and variance $\sigma \,\in \,\{0.1,0.2\}$. After that, we generated 9, 800 more Gaussian-corrupted images with noise variance ranging from 0.001 to 0.007 with steps of 0.001 with respect to the first experiment (i.e., when the first corrupted images were generated using $\sigma =0.1$). With respect to the second experiment (i.e., when the images were corrupted using $\sigma =0.2$), we generated 9, 800 more images corrupted with a Gaussian noise with variance ranging from 0.201 to 0.207 with steps of 0.001.

Finally, Caltech 101 Silhouettes database contains 4, 100 training images and 2, 307 testing data. Regarding this dataset, we also increased the number of training images to 24, 600 as follows: 4, 100 clean images and their corresponding noisy versions (Gaussian noise with zero-mean and variance of $\sigma \,\in \,\{0.1,0.2\}$ for both experiments). Also, we generated 4, 100 more images corrupted with a zero-mean Gaussian noise (variances of 0.001 and 0.002) considering $\sigma = 0.1$, and variances of 0.201 and 0.207 considering $\sigma = 0.2$.

Table 1. Parameters used considering both DBMs and DBNs.

Full size table

The proposed DBM-based approach was compared against a similar DBN (i.e. a DBN with “noisy nodes” as proposed by Keyvanrad et al. [6]) standard DBNs and DBMs, as well as against the well-known Wiener Filter. In order to evaluate the performance of the proposed method, the Peak signal-to-noise ratio (PSNR) between the noise-free image and its respective restored version is computed. Table 1 presents the parameters used for each technique. These values have been empirically chosen.

Table 2 presents the results, being the best ones in bold. The values in parenthesis stand for the best thresholds used to find the “noisy nodes”. As one can observe, the proposed approach obtained the best results in all cases whereas Gaussian noise with zero-mean and variance of 0.1. In regard to Gaussian noise with zero-mean and variance of 0.2, the proposed approach obtained the best results in two out three datasets, namely Caltech and Semeion. In regard to the MNIST, the best result was obtained by the DBN, but being closely followed by the proposed approach. More interestingly, the DBM with “noisy nodes” outperformed both the standard DBM and DBN in all situations. Also, the proposed approach obtained better results than Wiener Filter, which is considered one of the best approaches for image denoising.

Table 2. PSNR results concerning the image denoising procedure.

Full size table

Figure 2 displays some example images from the databases. Clearly, the images denoised by the proposed DBM seem to have less noise levels than the images filtered by standard DBM. The content of the number itself seems to be similar among the images, but its surroundings have been better restored by the proposed DBM.

5 Conclusion

In this work, a new DBM-based approach for robust image denoising has been proposed. The idea is to learn how to turn off nodes that are often activated when noisy images are presented to the network. The experiments in three public datasets showed the proposed approach obtained better results in two out three situations, but producing images with lower reconstruction errors than standard DBNs, DBMs and Wiener Filter in all datasets. In regard to future works, we aim at working with gray-scale images, as well as how to learn “noisy nodes” at different layers, not only in the top one.

Notes

1.
Notice the procedure over the noisy images do not consider the activation field already computed for the clean images.
2.
In this paper, we evaluated $T\in [0.1, 0.9]$ with steps of 0.1.
3.
Since we have $28\times 28$ images concerning all datasets, the visible layer has $28\,\times \,28 = 784$ nodes. Also, since we are using Bernoulli-based DBM/DBN models, we employed a min-max normalization of the grayscale values of the images’ pixels.
4.
Since Semeion database images are $16\,\times \,16$-sized, we centered them into a $28\,\times \,28$-black-squared window in order to have all images used in the experiments with the very same size.

References

Papa, J.P., Fonseca, L.M.G., Carvalho, L.A.S.: Projections onto convex sets through particle swarm optimization and its application for remote sensing image restoration. Pattern Recognit. Lett. 31(13), 1876–1886 (2010)
Article Google Scholar
Pires, R.G., Pereira, D.R., Pereira, L.A.M., Mansano, A.F., Papa, J.P.: Projections onto convex sets parameter estimation through harmony search and its application for image restoration. Nat. Comput. 15(3), 493–502 (2016)
Article MathSciNet Google Scholar
Katsaggelos, A.K.: Digital Image Restoration. Springer, New York (1991)
Book Google Scholar
Paik, J.K., Katsaggelos, A.K.: Image restoration using a modified Hopfield network. IEEE Trans. Image Process. 1, 49–63 (1992)
Article Google Scholar
Sun, Y., Yu, S.Y.: A modified Hopfield neural network used in bilevel image restoration and reconstruction. In: International Symposium on Information Theory Application, vol. 3, pp. 1412–1414 (1992)
Google Scholar
Keyvanrad, M.A., Pezeshki, M., Homayounpour, M.A.: Deep belief networks for image denoising. arXiv preprint arXiv:1312.6158 (2013)
Tang, Y., Salakhutdinov, R., Hinton, G.: Robust Boltzmann machines for recognition and denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, USA (2012)
Google Scholar
Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 341–349. Curran Associates Inc. (2012)
Google Scholar
Tang, Y., Salakhutdinov, R., Hinton, G.E.: Robust Boltzmann machines for recognition and denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, pp. 2264–2271. IEEE (2012)
Google Scholar
Yan, R., Shao, L.: Image blur classification and parameter identification using two-stage deep belief networks. In: 24th British Machine Vision Conference, pp. 1–11 (2013)
Google Scholar
Osindero, S., Hinton, G.E., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: AISTATS, vol. 1, p. 3 (2009)
Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Article MATH Google Scholar
Salakhutdinov, R., Hinton, G.E.: An efficient learning procedure for deep Boltzmann machines. Neural Comput. 24(8), 1967–2006 (2012)
Article MathSciNet MATH Google Scholar
Corinna, C., LeCun, Y., Burges, C.J.C.: The MNIST database of handwritten digits (1998)
Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Holub, A., Griffin, G., Perona, P.: Caltech-256 object category dataset (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, UFSCar - Federal University of São Carlos, São Carlos, Brazil
Rafael G. Pires, Gustavo B. Souza & Alexandre L. M. Levada
Department of Computing, UNESP - Univ Estadual Paulista, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01, Bauru, SP, 17033-360, Brazil
Daniel S. Santos, Aparecido N. Marana & João Paulo Papa

Authors

Rafael G. Pires
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo B. Souza
View author publications
You can also search for this author in PubMed Google Scholar
Aparecido N. Marana
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre L. M. Levada
View author publications
You can also search for this author in PubMed Google Scholar
João Paulo Papa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Paulo Papa .

Editor information

Editors and Affiliations

Universidad Federico Santa María, Santiago, Chile
Marcelo Mendoza
Carlos III University of Madrid, Madrid, Spain
Sergio Velastín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pires, R.G., Santos, D.S., Souza, G.B., Marana, A.N., Levada, A.L.M., Papa, J.P. (2018). A Deep Boltzmann Machine-Based Approach for Robust Image Denoising. In: Mendoza, M., Velastín, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science(), vol 10657. Springer, Cham. https://doi.org/10.1007/978-3-319-75193-1_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-75193-1_63
Published: 04 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75192-4
Online ISBN: 978-3-319-75193-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)