Keywords

1 Introduction

The main goal of this paper is noise suppression in images degraded by additive white Gaussian noise (AWGN). Noise is one of the main factors that degrades image quality [1, 2]. Various methods for image denoising are currently presented, but many researches are dedicated to design more efficient techniques [2,3,4,5,6,7,8]. The reason is that the quality of the restored images is still not completely satisfactory for the final users.

The image denoising techniques of state-of-the-art fall into two families [5]: (1) non-local filters [2] based on searching for similar patches and their joint processing, such as BM3D [3] and SA-DCT [4]; (2) those based on image clustering, kernel regression, singular value decomposition or principal component analysis for dictionary learning and sparse image representation minimization [5,6,7,8].

Among the family of non-local filters, nowadays the BM3D filter [3] has been shown to be the most efficient for processing the majority of grayscale test images [5, 9] and component-wise denoising of color test images [10] corrupted by AWGN. On the other hand, the filters based on sparse representation minimization show good results, which in some cases are superior to the BM3D technique [7], but these filters have a tremendous computational complexity, mainly due to very slow clustering stage and the following dictionary learning, localized histogram and other feature calculations. Besides, the iterative sparse minimization not only slows down the restoration but sometimes leads to worse denoising results.

Another aspect is denoising filter efficiency. Usually, a mean square criterion in the form of peak signal-to-noise ratio (PSNR) is used to estimate the filtered image quality. Unfortunately, such an approach not always means a better visual quality nor provides more appropriate image data for the posterior treatment and classification [12]. Furthermore, there are some relatively novel image quality criterions, such as based on the properties of human visual system (PSNR-HVSM) [13, 14], feature similarity index (FSIM) [15], structural similarity index (SSIM) [16] or multiscale structural similarity index (MSSIM) [17].

In this paper, we take advantages of both filter families. The presented filtering technique uses discrete cosine transform (DCT), the block matching algorithm in the transform domain of reduced computational complexity, Hadamard transform for patch group hard thresholding [11], and a fast clustering algorithm proposed to derive mean cluster spectra to form the local Wiener filter frequency responses. The obtained results with the proposed filter in terms of PSNR are similar to those obtained with BM3D, K-SVD [6] and NCSR [7], but are better in terms of SSIM.

The paper is organized as follows: in Sect. 2, image Wiener filter is considered and reduced to a hard thresholding filter in DCT domain. The proposed fast image clustering algorithm is described in Sect. 2. Section 3 explains the filtering technique, and numerical simulation results for the proposed filter in comparison to the best known filters of both mentioned above families are presented in Sect. 5. Finally, the conclusions follow.

2 Wiener Filtering and Thresholding in DCT Domain

Let us consider an additive image observation model

$$ u\left( {x,y} \right) = s\left( {x,y} \right) + n\left( {x,y} \right) $$
(1)

where \( u\left( {x,y} \right) \) is an observed noisy image, \( x,y \) are Cartesian coordinates, \( s\left( {x,y} \right) \) denotes a noise-free image, and \( n\left( {x,y} \right) \) is a white Gaussian noise not correlated with \( s\left( {x,y} \right) \). The problem is to find an estimate of the noise-free image \( \hat{s}\left( {x,y} \right) \) such that it minimizes mean square error (MSE) \( E\left\{ {\left[ {s\left( {x,y} \right) - \hat{s}\left( {x,y} \right)} \right]^{2} } \right\} \), where \( E\left\{ \cdot \right\} \) denotes the expectation operator.

The optimal linear filter that minimizes the MSE is the well-known Wiener filter that in the spectral domain can be formulated as [9]:

$$ H_{W} \left( {\omega_{x} ,\omega_{y} } \right) = \frac{{P_{s} \left( {\omega_{x} ,\omega_{y} } \right)}}{{P_{s} \left( {\omega_{x} ,\omega_{y} } \right) + P_{n} \left( {\omega_{x} ,\omega_{y} } \right)}}, $$
(2)

where \( P_{s} \left( {\omega_{x} ,\omega_{y} } \right),P_{n} \left( {\omega_{x} ,\omega_{y} } \right) \) are power spectral densities of the signal and noise, respectively. When the noise is Gaussian, it is supposed \( P_{n} \left( {\omega_{x} ,\omega_{y} } \right) = \sigma_{n}^{2} \), where \( \sigma_{n}^{2} \) is the variance of noise \( n\left( {x,y} \right) \).

Unfortunately, the exact power spectrum density \( P_{s} \left( {\omega_{x} ,\omega_{y} } \right) \) is unavailable and often is substituted by its estimate \( \hat{P}_{s} \left( {\omega_{x} ,\omega_{y} } \right) \) that allows obtaining an estimate of the Wiener filter frequency response \( \hat{H}_{W} \left( {\omega_{x} ,\omega_{y} } \right) \). In the considered case of additive white Gaussian noise, the estimated Wiener filter response is approximated by

$$ \hat{H}_{W} \left( {\omega_{x} ,\omega_{y} } \right) = \frac{{\hat{P}_{s} \left( {\omega_{x} ,\omega_{y} } \right)}}{{\hat{P}_{s} \left( {\omega_{x} ,\omega_{y} } \right) + \sigma_{n}^{2} }}. $$
(3)

Usually, \( P_{u} \left( {\omega_{x} ,\omega_{y} } \right) \) is taken as an estimation of \( P_{s} \left( {\omega_{x} ,\omega_{y} } \right) \), but such a choice is not optimal. Here, we propose to estimate \( P_{s} \left( {\omega_{x} ,\omega_{y} } \right) \) over the set of DCT patches of 8 × 8 (this block size was selected for the comparison purposes) image blocks grouped in clusters; additionally, the estimate \( \hat{P}_{s} \left( {\omega_{x} ,\omega_{y} } \right) \) is adjusted using the mean spectrum of the found similar patches in a searched range of the current image block as in the technique BM3D [4, 11] but instead of searching in the space domain we used the search in DCT domain as in our previous work [18].

Moreover, the direct implementation of (3) in DCT domain can be substituted by the zeroing of DCT coefficients below some threshold [4]. This threshold was analytically derived to be of \( 6.5217615 \cdot \sigma_{n}^{2} \) [18]:

$$ H_{T}^{{}} \left( {\omega_{x} ,\omega_{y} } \right) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}\,\left| {U\left( {\omega_{x} ,\omega_{y} } \right)} \right| \ge 2.55377397 \cdot \sigma_{n}^{{}} } \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right., $$
(4)

where \( U\left( {\omega_{x} ,\omega_{y} } \right) \) is the voltage DCT spectrum of the observed noisy image and \( \sigma_{n}^{{}} \) is noise standard deviation. Also, the hard threshold (4) can adapt its characteristics to the local spectrum properties calculating to this end the Wiener filter frequency response [18]:

$$ T\left( {\omega_{x} ,\omega_{y} } \right) = \left\{ {\begin{array}{*{20}c} {\beta_{\hbox{min} } \cdot \sigma } & {if\,\hat{H}_{i,j} \left( {\omega_{x} ,\omega_{y} } \right) > 0.87} \\ {\beta_{\hbox{max} } \cdot \sigma } & {f\,\hat{H}_{i,j} \left( {\omega_{x} ,\omega_{y} } \right) < 0.3} \\ {\beta \cdot \sigma } & {otherwise} \\ \end{array} } \right., $$
(5)

and \( \beta_{\hbox{min} } = 1,\,\beta_{\hbox{max} } = 2.9;\,\beta = 2.55377397 \) according to (4).

It was found by simulations that the image restoration quality highly depends on the quality of the signal spectral estimate \( \hat{P}_{s} \left( {\omega_{x} ,\omega_{y} } \right) \); so, the main problem is to estimate it better.

3 Local Spectra Estimation via Image Clustering

In this Section, we present a fast image clustering technique adequate for noise suppression in terms of local spectra \( P_{s} \left( {\omega_{x} ,\omega_{y} } \right) \) estimation to be used to form the local Wiener filter frequency response \( \hat{H}_{W} \left( {\omega_{x} ,\omega_{y} } \right) \) to process each 8 × 8 image block.

The proposed clustering algorithm starts with calculating 2D image histogram on image block mean and variance values in supposition that the image block data is described in terms of their first and second moments. To this end, the mean value range is assigned to be of 256 samples from 0 to 255; and the variance range is determined searching first for the maximum block variance value.

Then, the variance range is discretized; in this paper, we assume it is of 512 levels with respect to the maximum image block variance value, \( \sigma_{\hbox{max} }^{2} \). The number 512 is considered to be sufficiently high to represent the full range of variance values and not too large for the following watershed transform. Using the mean and variance values of each 8 × 8 overlapping image block, 2D histogram of 256 × 512 bins is formed. Figure 1(a) shows an example of such a histogram of the standard test image “Lenna” having \( \sigma_{n}^{2} = 25 \).

Fig. 1.
figure 1

Transforming 2D image histogram: (a) 2d histogram of test image “Lenna”; (b) watershed transform; (c) the resulted cluster number map.

Next, the maximum of the histogram is found and the histogram is modified changing to the ratio of this maximum and the original histogram values. After this, the watershed transform [19] is performed over the modified 2D histogram (see Fig. 1(b)).

Finally, the watershed ridge points having zero value are substituted by there neighbor points searching the coordinates that have the maximal values in 2D histogram, as shown in Fig. 1(c).

As a result, the considered image is free of ridge lines and is clustered with an automatically determined number of groups that is produced by the watershed transform. Figure 2 shows the test image “Lenna” clustered using the described algorithm, where each pixel value is the cluster number of the corresponding 8 × 8 blocks of the original image (the blocks are overlapped).

Fig. 2.
figure 2

Test image “Lenna” clustered with the proposed algorithm

4 Proposed Filtering Technique

The proposed advanced filtering technique is based on the BM3D [11] filtering strategy that assumes image clustering as described in Sect. 3: the search for similar blocks in the vicinity of the current image block, the formation of the lists of DCT coefficient patches and their processing using Hadamard transform, the hard thresholding and aggregation of the processed patches forming the filtered image.

The filtering technique starts with the pre-processing stage consisting of image clustering followed by calculation of the DCT patches for each image pixel in the range of \( \left( {M - m} \right) \times \left( {N - m} \right) \) pixels, \( M \times N \) is the image size, \( m = 8 \). Then, power spectrum estimations, \( \hat{P}\left( c \right) \), of each cluster are calculated. Next, the search for similar image blocks using the DCT transformed data [18] is performed forming the list of patch coordinates, \( {\mathbf{l}}_{U} \left( {i,j} \right) = \left\{ {l_{1} \left( {i,j} \right) \ldots l_{{b_{\hbox{max} } }} \left( {i,j} \right) \, } \right\} \), where \( l_{1} \left( {i,j} \right) \ldots l_{{b_{\hbox{max} } }} \left( {i,j} \right) \), \( b_{\hbox{max} } \) is the maximal number of similar patches, and the list of the corresponding distances \( {\mathbf{d}}\left( {i,j} \right) \) for their usage in the filtering process.

After the pre-processing stage, the filtering is performed as described below.

At the proper filtering stage, first the spectral estimate of the current \( i,j \)-th image block data is calculated using the list of patches \( {\mathbf{l}}_{U} \left( {i,j} \right) \) and the current cluster \( c(i,j) \) spectral estimate, \( \hat{P}\left( {c(i,j)} \right) \):

$$ \hat{P}_{s} \left( {i,j} \right) = \frac{{\hat{P}\left( {c(i,j)} \right) + \sum\limits_{k = 1}^{{{{b_{\hbox{max} } } \mathord{\left/ {\vphantom {{b_{\hbox{max} } } 2}} \right. \kern-0pt} 2}}} {U_{k} \left( {l_{k} \left( {i,j} \right)} \right) \cdot \tilde{w}_{k}^{P} \left( {i,j} \right)} }}{2}, $$
(6)

where \( b_{\hbox{max} } \) is the maximal number of similar patches in the list \( {\mathbf{l}}_{U} \left( {i,j} \right) \) for \( i,j \)-th image block, \( \tilde{w}_{k}^{P} \left( {i,j} \right) \) is a normalized weighting coefficient calculated on the patch distances as

$$ w_{k}^{P} \left( {i,j} \right) = \frac{{\exp \left\{ { - d_{k} \left( {i,j} \right)/200} \right\}}}{{\left\| {{\mathbf{w}}^{P} \left( {i,j} \right)} \right\|}}, $$
(7)

and \( \left\| {{\mathbf{w}}^{P} \left( {i,j} \right)} \right\| \) denotes the sum of the non-normalized coefficients \( w_{k}^{P} \left( {i,j} \right) \),.

With the estimate \( \hat{P}_{s} \left( {i,j} \right) \), the Wiener filter frequency response at \( i,j \)-th block position \( \hat{H}_{i,j} \left( {\omega_{x} ,\omega_{y} } \right) \) is formed according to (3). Next, Hadamard transform in the third dimension is applied to the group of patches \( {\mathbf{U}}\left( {i,j} \right) = \left\{ {{\mathbf{U}}_{k} \left( {l_{k} \left( {i,j} \right)} \right), \, k = \overline{{1,b_{\hbox{max} } }} } \right\} \): \( {\mathbf{U}}_{Hadamard} = Hadamard\left\{ {{\mathbf{U}}\left( {i,j} \right)} \right\} \).

Then, we propose the following thresholding procedure:

$$ {\tilde{\mathbf{U}}}_{Hadamard} = \left\{ {\begin{array}{*{20}c} {{\mathbf{U}}_{Hadamard} \left( {\omega_{x} ,\omega_{y} } \right)} & {if\;\omega_{x} = \omega_{y} = 0} \\ {else\quad {\mathbf{U}}_{Hadamard} \left( {\omega_{x} ,\omega_{y} } \right)} & {if\;\left| {{\mathbf{U}}_{Hadamard} \left( {\omega_{x} ,\omega_{y} } \right) \ge T\left( {\omega_{x} ,\omega_{y} } \right)} \right|} \\ 0 & {otherwise} \\ \end{array} } \right., $$
(8)

where the thresholds \( T\left( {\omega_{x} ,\omega_{y} } \right) \) are formed according to (5). After the thresholding and the inverse Hadamard transform, \( {\tilde{\mathbf{U}}}\left( {i,j} \right) = Hadamard^{ - 1} \left\{ {{\tilde{\mathbf{U}}}_{Hadamard}^{{}} } \right\} \), the group of filtered patches \( {\hat{\mathbf{U}}}\left( {i,j} \right) = \left\{ {{\hat{\mathbf{U}}}_{k} \left( {l_{k} \left( {i,j} \right)} \right), \, k = \overline{{1,b_{\hbox{max} } }} } \right\} \) is obtained applying the inverse DCT transform to the patches. Finally, the collaborative filtering by the aggregation of the filtered blocks is performed. Note that, opposite to scanning window filtering, the filtered values are obtained simultaneously for all pixels of a given block and for all blocks in the group \( {\mathbf{l}}\left( {i,j} \right) \). The averaged aggregation is performed as [18]

$$ \begin{aligned} {\hat{\mathbf{s}}}_{\Sigma } \left( {{\mathbf{l}}_{U} \left( {i,j} \right)} \right) & = \sum\limits_{k = 1}^{{b_{\hbox{max} }^{{}} }} {\left( {{\hat{\mathbf{U}}}_{k} \left( {l_{k} \left( {i,j} \right)} \right)} \right). \cdot w_{k}^{A} \left( {i,j} \right)} , \\ R_{\Sigma } \left( {{\mathbf{l}}_{U} \left( {i,j} \right)} \right) & = \sum\limits_{k = 1}^{{b_{\hbox{max} }^{{}} }} {w_{k}^{A} \left( {i,j} \right)} ,\,w_{k}^{A} \left( {i,j} \right) = \exp \left\{ { - d_{k} \left( {i,j} \right)/500} \right\}, \\ \left( {{\tilde{\mathbf{s}}}} \right)_{*} & = \frac{{\left( {{\hat{\mathbf{s}}}_{\Sigma } } \right)_{*} }}{{\left( {{\mathbf{R}}_{\Sigma } } \right)_{*} }} \\ \end{aligned} $$
(9)

where \( {\hat{\mathbf{s}}}_{\Sigma } \left( {{\mathbf{l}}\left( {i,j} \right)} \right) \) are the sum of the blocks defined by the list \( {\mathbf{l}}\left( {i,j} \right) \), \( R_{\Sigma } \left( {{\mathbf{l}}\left( {i,j} \right)} \right) \) is the sum of weighting coefficients \( w_{k}^{A} \left( {i,j} \right) \), \( {\tilde{\mathbf{s}}} \) is the estimated image obtained from the pre-filtering stage, \( \left( {} \right)_{*} \) denotes element-wise operations.

5 Results

The numerical simulations using standard test images with plain regions “Lena”, “F-16”, “peppers” and textural images “aerial”, “baboon”, “bridge” are performed. For comparison purposes, three best state-of-the-art filters, K-SVD, BM3D (with the same search area defined by the parameter \( shift = 21 \) and overlapping step 1) and NCSR were used.

The filtering results are illustrated in Fig. 3, and the obtained values of PSNR and SSIM are presented in Tables 1 and 2. The visual quality of the images processed by the considered filters is very similar, although NCSR presents some over smoothing and the detail preservation is better with the proposed filter.

Fig. 3.
figure 3

Amplified fragments of image “Lena” distorted with \( \sigma^{2|} = 225 \) AWGN and processed by different filters (from left to right and top to bottom): original, NCSR, BM3D, proposed.

Table 1. Results of the filtering of the standard test images with different techniques in terms of PSNR. The best results are marked as bold.
Table 2. Results of the filtering of the standard test images with different techniques in terms of SSIM. The best results are marked as bold.

The simulation result analysis presented in Table 1 shows that in the majority of cases the very complex filter NCSR produced the best result, and the proposed filtering technique is competitive with the state-of-the-art filters in the PSNR sense. From the data presented in Table 2 it follows that in almost all cases the proposed filter is superior in terms of SSIM.

6 Conclusions

A filtering technique to process the images contaminated by additive white Gaussian noise has been presented. The algorithm uses discrete cosine transform and the groups of patches similar to the current image block, which are found using the proposed clustering algorithm and the search algorithm of the reduced complexity. The noisy components are rejected according to the Wiener filtering approach using Hadamard transform for thresholding and weighted aggregation. The obtained filtering results in comparison to the state-of-the-art filters, such as K-SVD, BM3D, NCSR show that the proposed algorithm is competitive in terms of signal-to-noise ratio and in almost all cases is superior in terms of structural similarity. Meanwhile, the considered filter results in terms of the visual quality are very similar, although the proposed filter preserves the image details better.