1 Introduction

During the image acquisition process, some level of noise is usually added to the real data mainly due to physical limitations of the acquisition sensor, and also regarding imprecisions during the data transmission and manipulation. Therefore, the resultant image needs to be processed in order to attenuate its noise without losing details present at high frequencies areas, being the field of image processing that addresses such issue called “image restoration”. In this context, some well-known image restoration methods, such as inverse and Wiener filter, regularization, projection-based [1, 2] and Maximum a Posteriori probability techniques have been developed over the last decades [3]. Despite of machine learning being a well consolidated research field dating back to the 60’s, only in the last years it has been employed to address the problem of image restoration [4, 5].

Recently, deep learning techniques have been considered a page-turner due to the outstanding results in a number of computer vision-related problems, such as face and object recognition, just to name a few. In the last years, some works have addressed the problem of image restoration using such approaches, such as Keyvanrad et al. [6], which employed Deep Belief Networks (DBN) to smooth noise in images. Tang et al. [7] proposed the Robust Boltzmann Machine (RoBM), which allows Boltzmann Machines to be more robust to image corruptions. The model is trained in an unsupervised fashion with unlabeled noisy data, and can learn the spatial structure of the occluders. Compared to some standard algorithms, the model has been significantly better for denoising face images.

Xie et al. [8] used deep networks pre-trained with auto-encoders for image inpainting and denoising, and Tang et al. [9] employed Restricted Boltzmann Machines (RBMs) for the very same purpose of image denoising. Later on, Yan and Shao [10] used Deep Belief Networks [11] (DBNs) to identify blur type and parameters in natural images. Recently, a new RBM-based architecture was proposed, called Deep Boltzmann Machines (DBMs) [12]. This new approach have presented some great results in many areas outperforming the DBNs, since in the training phase of that network not only bottom-up information is considered, but also top-down influences. As a matter of fact, our approach is based on the work by Keyvanrad et al. [6], a new deep learning-based approach for robust image denoising using DBMs is proposed. We showed the proposed approach outperforms DBNs and standard DBMs in the context of image denoising.

2 Restricted Boltzmann Machines

Restricted Boltzmann Machines [13] are energy-based stochastic neural networks composed of two layers of neurons (visible and hidden), in which the learning phase is conducted by means of an unsupervised fashion. A naïve architecture of a Restricted Boltzmann Machine comprises a visible layer \(\mathbf v \) with m units and a hidden layer \(\mathbf h \) with n units. Additionally, a real-valued matrix \(\mathbf W _{m\times n}\) models the weights between the visible and hidden neurons, where \(w_{ij}\) stands for the weight between the visible unit \(v_i\) and the hidden unit \(h_j\).

At first, let us assume both \(\mathbf v \) and \(\mathbf h \) as being binary-valued units. In other words, \(\mathbf v \in \{0,1\}^m\) e \(\mathbf h \in \{0,1\}^n\), thus leading to the so-called Bernoulli-Bernoulli Restricted Boltzmann Machine, since both units follow a Bernoulli distribution. The energy function of an RBM is given by:

$$\begin{aligned} E(\mathbf v ,\mathbf h )=-\sum _{i=1}^ma_iv_i-\sum _{j=1}^nb_jh_j-\sum _{i=1}^m\sum _{j=1}^nv_ih_jw_{ij}, \end{aligned}$$
(1)

where \(\mathbf a \) e \(\mathbf b \) stand for the biases of visible and hidden units, respectively.

The probability of a joint configuration \((\mathbf v ,\mathbf h )\) is computed as follows:

$$\begin{aligned} P(\mathbf v ,\mathbf h )=\frac{1}{Z}e^{-E(\mathbf v ,\mathbf h )}, \end{aligned}$$
(2)

where Z stands for the so-called partition function, which is basically a normalization factor computed over all possible configurations involving the visible and hidden units. Similarly, the marginal probability of a visible (input) vector is given by:

$$\begin{aligned} P(\mathbf v )=\frac{1}{Z}\displaystyle \sum _\mathbf{h }e^{-E(\mathbf v ,\mathbf h )}. \end{aligned}$$
(3)

Since the RBM is a bipartite graph, the activations of both visible and hidden units are mutually independent, thus leading to the following conditional probabilities:

$$\begin{aligned} P(v_i=1|\mathbf h )=\phi \left( \sum _{j=1}^nw_{ij}h_j+a_i\right) , \end{aligned}$$
(4)

and

$$\begin{aligned} P(h_j=1|\mathbf v )=\phi \left( \sum _{i=1}^mw_{ij}v_i+b_j\right) . \end{aligned}$$
(5)

Note that \(\phi (\cdot )\) stands for the sigmoid function.

Let \(\varTheta = (W, a, b)\) be the set of parameters of a RBM, which can be learned though a training algorithm that aims at maximizing the product of probabilities given all the available training data \(\mathcal{V}\), as follows:

$$\begin{aligned} \arg \max _{\varTheta }\prod _\mathbf{v \in \mathcal{V}}P(\mathbf v ). \end{aligned}$$
(6)

One of the most used approaches to solve the above problem is the Contrastive Divergence (CD) [13], which basically ends up performing Gibbs sampling using the training data as the visible units, instead of random inputs.

2.1 Deep Boltzmann Machines

Salakhutdinov and Hinton [12] presented the DBM, which aims at improving the inference during the learning process, since it now considers both directions of interaction among adjacent layers. Salakhutdinov and Hinton [14] proposed the use of a variable inference method called “Mean-Field” to enhance the DBM learning procedure. This technique approximates the posterior distributions inferred from the observed data of the estimates based on isolated network segments. The training process of a DBM consists in minimizing the total energy of the system according to the parameters found through partial inferences made through the mean-fields (MF).

Roughly speaking, the idea is to find an approximation \(Q^{MF}(\mathbf h |\mathbf v ; \varvec{\mu })\) that best represents the true distribution of the hidden layers, i.e. \(P(\mathbf h |\mathbf v )\). This approximation is computed through the following factored distribution:

$$\begin{aligned} Q^{MF}(\mathbf h |\mathbf v ; \varvec{\mu }) = \prod _{l=1}^L \left[ \prod _{k=1}^{F_l} q(h_k^l) \right] , \end{aligned}$$
(7)

where L stands for the number of hidden layers, \(F_l\) represents the number of nodes in the hidden layer l, and \(q(h_k^l=1)=\mu _k^l\). The goal is to find the parameters of the mean-field \(\varvec{\mu } = \left\{ \varvec{\mu _1}, \varvec{\mu _2},\ldots , \varvec{\mu ^L} \right\} \) according to following equations:

$$\begin{aligned} \mu _k^1 = \phi \left( \sum _{i=1}^m w_{ik}^1v_i + \sum _{j=1}^{F_2}w_{kj}^2\mu _j^2 \right) , \end{aligned}$$
(8)

which represents the interaction between the first hidden layer and your previous visible layer where \(\phi \) is a sigmoid function. Similarly, the interactions between hidden layers \(l - 1\) and l are given as follows:

$$\begin{aligned} \mu _k^l = \phi \left( \sum _{i=1}^{F_{l-1}} w_{ik}^l\mu _i^{l-1} + \sum _{j=1}^{F_{l+1}}w_{kj}^{l+1}\mu _j^{l+1} \right) , \end{aligned}$$
(9)

where \(w^l_{ij}\) stands for the weight between node i from hidden layer \(l-1\) and node j from hidden layer l. Finally, the mean-field parameters for the hidden layer at the top of the DBM are calculated by:

$$\begin{aligned} \mu _k^L = \phi \left( \sum _{i=1}^{F_{L-1}} w_{ik}^L\mu _i^{L-1} \right) . \end{aligned}$$
(10)

3 Proposed Approach

The goal of this work is to propose a new DBM-based denoising approach, which learns how to deactivate some of its nodes in order to smooth the noise levels from images. After training the DBM with the clean (noise-free) and noisy training images together, we used a criterion called relative activity [6] (\(\varvec{\psi }^*\)), which is defined as the difference among the mean activation values of the upper most hidden nodes. For the sake of explanation, after training the DBM using the mean-field procedure, we take first the clean images and propagate them upwards. For each clean image, we store the activation field of the top layer in order to compute the mean activation field regarding all clean images, hereinafter called \(\psi _{clean}\). Further, we conduct the very same procedure for the noisy images to estimate their mean activation field at the top layerFootnote 1, hereinafter called \(\psi _{noisy}\). Therefore, one has two mean activation fields at the very top layer: one for the clean and another for the noisy images. Thus, the aforementioned relative activity is computed as the difference between the mean activation fields of the clean and noisy images, i.e., \(\varvec{\psi }^*= \left| \varvec{\psi }_{clean}-\varvec{\psi }_{noisy}\right| \).

Further, one needs to find out the so-called “noise nodes”, which stand for the nodes of the upper most hidden layer that are activated upon the presence of noisy structures, i.e., such nodes become more “excited” when they are presented to noisy elements. Basically, we thresholded the relative activity values as follows:

$$\begin{aligned} \psi ^*_i= & {} \left\{ \begin{array}{ll} \psi ^{clean}_i &{} \text{ if } \psi ^*_i > T \\ \psi ^*_i &{} \text{ otherwise, } \end{array}\right. \end{aligned}$$
(11)

where T stands for the threshold valueFootnote 2. Figure 1a illustrates the aforementioned process.

Fig. 1.
figure 1

Illustration of the proposed DBM for denoising purposes: (a) difference between the mean activations fields of the clean and noisy images, and (b) proposed DBM-based image denoising approach.

Finally, concerning the denoising step, given a noisy image, we perform a bottom-up pass until we reach the top layer (final inference). Then, the noisy nodes will be in charge of replacing their corresponding nodes of the noisy image. Since the noisy nodes were deactivated in the previous step, the reconstructed image (top-down step) is now much cleaner than before. This procedure is depicted in Fig. 1b.

4 Experiments and Discussion

We used the following image databases to evaluate the performance of the proposed approach: MNIST [15], Semeion [16], and Caltech 101 Silhouettes [17]. In this work, we employed a neural architecture composed of 3 layers containing 784-1000-500-250 nodes. Recall that such architecture has been empirically chosenFootnote 3 \(^,\) Footnote 4.

The main idea is to train the network in a way that is possible to learn the mapping between clean and noisy images. The MNIST database contains 60, 000 training images, as well as 10, 000 testing images. In the experiments, we used a subset of 20, 000 images for training composed of 10, 000 noiseless images and their respective noisy versions (10, 000). The noisy images were generated by means of a Gaussian noise with zero mean and two different variance values: \(\sigma \in \{0.1,0.2\}\), since we conducted two different experiments. In regard to the test phase (denoising), we used all the 10, 000 testing images corrupted with a zero-mean Gaussian noise and variance levels of \(\sigma \in \{0.1,0.2\}\).

Since Semeion database contains 1, 400 training images only, we increased the training set size to 22, 400 as follows: we kept the original 1, 400 images, and further we generated 1, 400 more images that are noisy versions of the original ones. Once again, since we considered different noise levels for two separated experiments, the images were corrupted with a zero-mean Gaussian distribution and variance \(\sigma \,\in \,\{0.1,0.2\}\). After that, we generated 9, 800 more Gaussian-corrupted images with noise variance ranging from 0.001 to 0.007 with steps of 0.001 with respect to the first experiment (i.e., when the first corrupted images were generated using \(\sigma =0.1\)). With respect to the second experiment (i.e., when the images were corrupted using \(\sigma =0.2\)), we generated 9, 800 more images corrupted with a Gaussian noise with variance ranging from 0.201 to 0.207 with steps of 0.001.

Finally, Caltech 101 Silhouettes database contains 4, 100 training images and 2, 307 testing data. Regarding this dataset, we also increased the number of training images to 24, 600 as follows: 4, 100 clean images and their corresponding noisy versions (Gaussian noise with zero-mean and variance of \(\sigma \,\in \,\{0.1,0.2\}\) for both experiments). Also, we generated 4, 100 more images corrupted with a zero-mean Gaussian noise (variances of 0.001 and 0.002) considering \(\sigma = 0.1\), and variances of 0.201 and 0.207 considering \(\sigma = 0.2\).

Table 1. Parameters used considering both DBMs and DBNs.

The proposed DBM-based approach was compared against a similar DBN (i.e. a DBN with “noisy nodes” as proposed by Keyvanrad et al. [6]) standard DBNs and DBMs, as well as against the well-known Wiener Filter. In order to evaluate the performance of the proposed method, the Peak signal-to-noise ratio (PSNR) between the noise-free image and its respective restored version is computed. Table 1 presents the parameters used for each technique. These values have been empirically chosen.

Table 2 presents the results, being the best ones in bold. The values in parenthesis stand for the best thresholds used to find the “noisy nodes”. As one can observe, the proposed approach obtained the best results in all cases whereas Gaussian noise with zero-mean and variance of 0.1. In regard to Gaussian noise with zero-mean and variance of 0.2, the proposed approach obtained the best results in two out three datasets, namely Caltech and Semeion. In regard to the MNIST, the best result was obtained by the DBN, but being closely followed by the proposed approach. More interestingly, the DBM with “noisy nodes” outperformed both the standard DBM and DBN in all situations. Also, the proposed approach obtained better results than Wiener Filter, which is considered one of the best approaches for image denoising.

Table 2. PSNR results concerning the image denoising procedure.
Fig. 2.
figure 2

Example images considering MNIST (first row), Caltech (second row) and Semeion (third row) databases. (a) First experiment, from left to right: original, noisy, standard DBM, and proposed DBM-based denoised images; (b) Second experiment from left to right: original, noisy, DBN considering MNIST, DBM considering both Caltech and Semeion, DBN [6] considering MNIST, and proposed DBM for Caltech and Semeion.

Figure 2 displays some example images from the databases. Clearly, the images denoised by the proposed DBM seem to have less noise levels than the images filtered by standard DBM. The content of the number itself seems to be similar among the images, but its surroundings have been better restored by the proposed DBM.

5 Conclusion

In this work, a new DBM-based approach for robust image denoising has been proposed. The idea is to learn how to turn off nodes that are often activated when noisy images are presented to the network. The experiments in three public datasets showed the proposed approach obtained better results in two out three situations, but producing images with lower reconstruction errors than standard DBNs, DBMs and Wiener Filter in all datasets. In regard to future works, we aim at working with gray-scale images, as well as how to learn “noisy nodes” at different layers, not only in the top one.