1 Introduction

Undergone a few years of exploration since the daring work of Fergus et al. [1] for blind image deblurring, unnatural image models have been predominating the blind deblurring literature until now. On this line, the first inspiring try is harnessing the normalized sparsity measure in [4] with the idea that the image prior should favor a sharp image to its blurry one. Nevertheless, the method cannot produce state-of-the-art performance on this or that benchmark dataset, let alone blurry images in the wild [2]. The normalized sparsity is mathematically an approximation of the L0-norm in essence, indicating that the salient edges matter a bit more than the faint textures to the final success of blind deconvolution for natural images. In fact, unnatural image priors are not only requested in the MAP framework but also advocated in the VB case in spite of its more robustness in posterior inference. For example, another work by the authors of the present paper has proposed to determine priors for blind image deblurring as a self-learning problem [6] in the VB framework. The results show that the learned model resembles in a sense the non-informative Jeffreys prior, whose negative-logarithm is of course a new approximation to the L0-based model. Instead of approximating the L0-norm with diverse strategies, a pure L0-based image prior was firstly proposed in [7] for blind deblurring. However, they are found not generalized well to the large blur especially in specific imaging scenarios, e.g., face, text, or low-illumination images. In [8] a new L0-norm-based intensity and gradient prior is presented for deblurring of the specific text images. Furthermore, an exemplar-driven method with L0-norm-based regularization on image gradients is proposed in [9] for face image deblurring.

In the blind deblurring field, the comforting thing is that numerous algorithms have been put forward in the past decade, which achieve better and better performance on one or another synthetic dataset. However, as claimed in [2] the performance of early methods on the benchmark datasets is generally found inferior to that on those real blurred images. In other words, those methods are far from being practical in terms of the restoration quality. Actually a real breakthrough for blind deblurring is just made very recently in [3] which combines the L0-regularized sparsity of both the image gradient and dark channel. The experimental results prove its superior performance to all the representative methods in the past decade as studied in [2]. Note that, although the L0-based dark channel prior is discriminative as desired, the whole composite sparse model of [3] is not necessarily so. Besides, it can be actually thought of as a smart generalization over [8] and therefore is not a pure gradient-based method.

In spite of the recent great progress in this field, this paper aims to formulate the blind problem with a simper modeling perspective. What is more important, the newly proposed approach is expected to achieve comparative or even better performance towards the real blurred images. Specifically, the core innovation idea is the proposal of a pure gradient-based discriminative prior for accurate and robust blur kernel estimation. Experimental results on both benchmark datasets and real-world images in various imaging scenarios, e.g., natural, manmade, low-illumination, text, or people, demonstrate well the effectiveness and robustness of the proposed method.

2 A Plug-and-Play Approach to Gradient-Based Discriminative Blind Deconvolution

2.1 Gradient-Based Discriminative Prior

Our discussion begins with the first daring attempt towards discriminative image modeling for blind image deconvolution, i.e., the normalized sparsity [4]. As indicated in [2, 3], its discriminativeness and effectiveness is, however, questionable in both synthetic and practical experiments. Discriminativeness generally guarantees that the optimum should be not the pair of blurred image and delta kernel. While, the effectiveness means that image details such as the textures should be removed from the intermediate sharp image for accurate kernel estimation, as being validated in existing methods [7, 10, 11].

Taking above two factors into consideration, a new candidate prior for blind image deconvolution is presented as

$$ \mathcal{R}(u) = {\mathbf{\sum }}_{p} \varpi_{x,p} (u) \cdot \left| {\partial_{x} u_{p} } \right|^{\alpha } + {\mathbf{\sum }}_{p} \varpi_{y,p} (u) \cdot \left| {\partial_{y} u_{p} } \right|^{\alpha } , $$
(1)

where \( u \) is a sharp image, \( p \in \varOmega (u) \) a pixel index, and \( \alpha \) a positive value far less than 1, \( \partial \) a derivative operator, \( \varpi_{x,p} (u) \) a positive value related to pixel index and derivative direction. It is not hard to deduce that the core novelty of the prior \( \mathcal{R}(u) \) should be in the definition of \( \varpi \) which embodies the demanded discriminativeness and effectiveness for plausible intermediate image update.

We find that the requested discriminativeness can be naively achieved by adapting the simple normalized sparsity [4], while the effectiveness of accurate intermediate image update can be further ensured by adapting the relative total variation [5]. Then, we could simply express \( \mathcal{R}(u) \) as a gradient-based composite image prior. Specifically, \( \varpi_{o,p} (u),\,o \in \,\left\{ {x,y} \right\} \) is defined as

$$ \varpi_{o,p} (u) = \frac{1 - t}{{\left( {\mathcal{D}_{o} (u)} \right)^{\beta } + \varepsilon }} + \frac{t}{{\left( {\mathcal{S}\ominus_{o} (p)} \right)^{\beta } + \varepsilon }},\ominus $$
(2)

where \( \beta \) is a positive power, t is a value between 0 and 1, \( \varepsilon \) is a small positive number to avoid division by zero, and \( \mathcal{D}_{o} (u) \) and \( \mathcal{S}_{o} (p) \) are expressed respectively as

$$ \mathcal{D}_{o} (u) = \left( {{\mathbf{\sum }}_{p \in \varOmega (u)} \left| {\partial_{o} u_{p} } \right|^{2} } \right)^{1/2} = \left\| {\partial_{o} u} \right\|_{2} , $$
(3)
$$ \mathcal{S}_{o} (p) = \left| {{\mathbf{\sum }}_{q \in \varOmega (p)} \phi_{p, \, q} \cdot \partial_{o} u_{q} } \right|, $$
(4)

where \( \varOmega (p) \) is the rectangular field centered at pixel p, and \( \phi_{p, \, q} \) is defined according to the spatial affinity as a distance function of Gaussianity, i.e.,

$$ \phi_{p, \, q} \propto \exp \left( { - \frac{{(x_{p} - x_{q} )^{2} + (y_{p} - y_{q} )^{2} }}{{2\sigma^{2} }}} \right), $$

where \( \sigma \) is a spatial scale to be specified in implementation. We should claim that \( \mathcal{S}_{o} (p) \) was originally proposed in [5] for image filtering and manipulation, whose value in a window just with textures is found statistically smaller than that in a window also containing structural edges.

Let’s dive into (1) and (2) for more details. One finding is that, \( {\mathbf{\sum }}_{p} {{|\partial_{x} u_{p} |^{\alpha } } \mathord{\left/ {\vphantom {{|\partial_{x} u_{p} |^{\alpha } } {(\mathcal{D}_{x} (u))^{\beta } }}} \right. \kern-0pt} {(\mathcal{D}_{x} (u))^{\beta } }} + {\mathbf{\sum }}_{p} {{|\partial_{y} u_{p} |^{\alpha } } \mathord{\left/ {\vphantom {{|\partial_{y} u_{p} |^{\alpha } } {(\mathcal{D}_{y} (u))^{\beta } }}} \right. \kern-0pt} {(\mathcal{D}_{y} (u))^{\beta } }} \) rises a primary function on discriminating sharp images from blurred ones as proper settings are provided to \( \alpha ,\beta \). It is apparent that the above regularization term will degenerate to the normalized sparsity [4] as \( \alpha ,\beta \) are equal to 1. Another finding is that, the performance of above regularization term can be further boosted via \( {\mathbf{\sum }}_{p} {{|\partial_{x} u_{p} |^{\alpha } } \mathord{\left/ {\vphantom {{|\partial_{x} u_{p} |^{\alpha } } {(\mathcal{S}_{x} (p))^{\beta } }}} \right. \kern-0pt} {(\mathcal{S}_{x} (p))^{\beta } }} + {\mathbf{\sum }}_{p} {{|\partial_{y} u_{p} |^{\alpha } } \mathord{\left/ {\vphantom {{|\partial_{y} u_{p} |^{\alpha } } {(\mathcal{S}_{y} (p))^{\beta } }}} \right. \kern-0pt} {(\mathcal{S}_{y} (p))^{\beta } }} \). The reason is that it is able to remove the interfering textures while making the salient structures stand out more accurately in the intermediate sharp image. Such additional amending is proved very critical to the high quality blind deconvolution in spite that the amending strength governed by the parameter t is relatively less. A large amount of experiments demonstrate that t set as 0.05 satisfactorily serves the plug-and-play algorithm deduced in the following subsection.

2.2 A Plug-and-Play Numerical Scheme to Blind Deconvolution

As the blur is assumed spatially-invariant, the blurred image observation process can be described as

$$ g = k * u + n, $$
(5)

where u denotes the latent sharp image, g the captured blurry image, k the blur kernel corresponding to the camera shake or out-of-focus, and n the possible random noise. It is known that blind image deconvolution is mathematically ill-posed because there are infinite solution pairs (u, k) satisfying the formulation (5). Therefore, appropriate regularization should be imposed on both the image u and the kernel k.

Harnessing the proposed model (1), a MAP-based objective function for blind deblurring can be expressed as

$$ \mathcal{J}(u,k) \triangleq \left\| {g - k * u} \right\|_{2}^{2} + \lambda \mathcal{R}(u) + \eta \left\| k \right\|_{2}^{2} , $$
(6)

where \( \lambda \) and \( \eta \) are the two positive adjusting parameters. The first quadratic term is for the image fidelity, while the third term is a Tikhonov regularization on the blur kernel k. Note that, the formulation (6) works free of any ad-hoc modeling tricks, e.g., continuation, or additional image processing operations such as bilateral smoothing or shock filtering. In consequence, the blind deconvolution performance of the proposed algorithm will be overwhelmingly determined by the discriminative image prior (1), considering that the Tikhonov penalty on the blur kernel is a standard configuration in a large majority of existing methods. This paper sets the tuning parameter \( \eta \) as 2.

Now, the image and the kernel can be obtained by solving the joint minimization problem \( (\hat{u},\hat{k}) = \arg \min_{u,k} \mathcal{J}(u,k) \) in an alternatingly iterative manner. Provided the (i − 1)th iterative solution of \( k^{(i - 1)} \), \( u^{(i)} \) and \( k^{(i)} \) are then respectively solved by \( u^{(i)} = \arg \min_{u} \mathcal{J}(u,k^{(i - 1)} ) \) and \( k^{(i)} = \arg \min_{k} \mathcal{J}(u^{(i)} ,k) \).

In this paper, the half-quadratic regularization strategy is used to estimate \( u^{(i)} \) via decomposing the original minimization problem into two simper sub-problems. An auxiliary variable is firstly introduced corresponding to u, i.e., let \( u = z, \) and then a new objective function can be obtained as

$$ \mathcal{J}(u,z,k^{(i - 1)} ) \triangleq \left\| {g - k^{(i - 1)} * u} \right\|_{2}^{2} + \lambda \mathcal{R}(z) + \rho \left\| {u - z} \right\|_{2}^{2} , $$

whose minimizing solution, i.e., the intermediate sharp image, approaches that of \( \mathcal{J}(u,k^{(i - 1)} ) \) as \( \rho \) is close to infinity. In each alternative minimization over u and z, it is obvious that \( u \) can be efficiently gained via use of fast Fourier transform (FFT) in a closed form solution. That is,

$$ u = \mathcal{F}^{ - 1} \left( {\frac{{\overline{{\mathcal{F}(k^{(i - 1)} )}} \mathcal{F}(g) \, + \, \rho \mathcal{F}(z)}}{{\overline{{\mathcal{F}(k^{(i - 1)} )}} \mathcal{F}(k^{(i - 1)} ) \, + \, \rho }}} \right), $$
(7)

where \( \mathcal{F} \) and \( \bar{\mathcal{F}} \) represent the FFT and its complex conjugate, respectively, and \( \mathcal{F}^{ - 1} \) represent the operation of inverse FFT. Besides, as usual \( z \) is initialized to be a zero image. Given u, \( z \) is numerically computed by minimizing the sub-problem

$$ \left\| {u - z} \right\|_{2}^{2} + \frac{\lambda }{\rho }\mathcal{R}(z). $$
(8)

Apparently, solving (8) actually amounts to an amendatory step of image smoothing regularized by (1) and is implemented via the reweighted least squares approximation [5]. In this perspective, the intermediate sharp image estimation falls into a plug-and-play framework seminally proposed in [12, 13]. To the very best of our knowledge, this paper is the first to apply the plug-and-play idea for blind image deconvolution via use of a specifically customized discriminative prior.

With an estimated intermediate image \( u^{ (i )} \), blur kernel \( k^{(i)} \) can be produced by solving the Tikhonov-based energy functional \( k^{(i)} = \arg \min_{k} \mathcal{J}(u^{(i)} ,k) \). In spite of that, a slightly modified functional defined in the gradient domain as commonly practiced in blind deblurring [3, 19] is used for better estimation. That is,

$$ k^{(i)} = \arg \min\nolimits_{k} \left\| {\nabla g - k * \nabla u^{ (i )} } \right\|_{2}^{2} + \eta \left\| k \right\|_{2}^{2} , $$
(9)

wherein \( k^{(i)} \) can be solved very efficiently in a closed-form via FFT in exactly the same way as updating the image u in (7). One more point to be noted is that, blur kernel \( k^{(i)} \) should be projected onto the set \( \mathcal{C} = {\text{\{ }}k \ge 0, \, \sum_{i} \sum_{j} |k_{{i\text{,}j}} | = 1{\text{\} }} \) considering the physical property of blur kernels.

3 Experimental Results

This section validates the proposed approach on the datasets proposed by Lai et al. [2] with comparisons against the current representative blind deblurring algorithms: [1, 7, 10, 11, 14,15,16,17,18,19]. Besides the PSNR, the SSIM in [20] and the no-reference metric in [21] are also harnessed for quantitative assessment of different methods in this part. Note that, [21] is specifically proposed to evaluate the motion deblurring quality which is consistent with human feelings and ratings to a certain degree.

The datasets in Lai et al. [2] include a synthetic one consist of 100 blurred images generated by 4 blur kernels shown in Fig. 1 and 25 true clear images divided into 5 categories, i.e., natural (N), manmade (M), text (T), people (P), and saturated (S), as well as a real one containing 100 blurred color images collected from either previous deblurring works, or Flicker and Google Search, or those captured by the authors themselves, which also fall into the above five categories.

Fig. 1.
figure 1

Blur kernels with different sizes used for generating the 100 synthetic blurry images in Lai et al. [2].

3.1 Synthetic Experiment Results

Tables 1, 2, 3, and 4 list the average statistics of the three metrics for the deblurred images corresponding to each of the blur kernels in Fig. 1. In every table, each row represents the average evaluation across the five image categories, i.e., N, M, T, P, S. It is seen that the overall performance of our approach ranks the first in almost all scenarios in terms of either PSNR, or SSIM, or no-reference metric, proving its effectiveness and robustness in dealing with various kinds of blurred images with different kernel sizes.

Table 1. Average statistics of PSNR (dB), SSIM [20], and no-reference (no-ref.) metric [21] of the final deblurred images corresponding to each blind deblurring approach on the 25 blurred images generated by kernel01 (31 × 31) in the dataset of Lai et al. [2]. Red denotes the best, blue the second, and green the third.
Table 2. Average statistics of PSNR (dB), SSIM, and no-reference (no-ref.) metric of the final deblurred images corresponding to each blind deblurring approach on the 25 blurred images generated by kernel02 (51 × 51) in the dataset of Lai et al. [2]. Red denotes the best, blue the second, and green the third.
Table 3. Average statistics of PSNR (dB), SSIM [20], and no-reference (no-ref.) metric [21] of the final deblurred images corresponding to each blind deblurring approach on the 25 blurred images generated by kernel03 (55 × 55) in the dataset of Lai et al. [2]. Red denotes the best, blue the second, and green the third.
Table 4. Average statistics of PSNR (dB), SSIM [20], and no-reference (no-ref.) metric [21] of the final deblurred images corresponding to each blind deblurring approach on the 25 blurred images generated by kernel04 (75 × 75) in the dataset of Lai et al. [2]. Red denotes the best, blue the second, and green the third.

We note an exception that in terms of the no-reference metric [21], our approach seems perform slightly inferior to [18] and [16] as dealing with images convolved by kernel04, as shown in Table 4. However this objective evaluation does not comply with the practical visual perception and the comparison should be more based on the PSNR and SSIM in this situation. A notable instance can be observed from Fig. 2, where our approach has produced a blur kernel of very high precision and therefore a reasonably good deblurred image. It is found that, however, it only ranks the last among the eight compared methods in terms of no-reference metric. In fact, all the other approaches completely fail in this example. Thus, the visual comparisons show that, to some extent [21] is not applicable for fairly measuring the saturated image deblurring quality. Therefore, in this case PSNR and SSIM should be more relied on for fair assessment of various algorithms. In brief, the comprehensive evaluation shows that our approach performs comparatively or better in all the five blur scenarios.

Fig. 2.
figure 2

Deblurring results of the blurred image text04-kernel04 in the dataset Lai et al. [2] corresponding to the top eight approaches in terms of PSNR/SSIM. The value of no-reference metric [67] is shown in each image for visual perception assessment. Their PSNR/SSIM values are respectively [17] (13.63 dB/0.5611), [18] (13.72 dB/0.5634), [15] (13.74 dB/0.5605), [14] (13.83 dB/0.5903), [11] (13.91 dB/0.5724), [7] ( ), [19] ( ), Ours ( ).

3.2 Realistic Experiment Results

Since there is not a reliable quantitative metric of measuring the deblurring quality for the real, comparisons are made merely based on our visual perception. In this part, the practical performance of the recent breakthrough work [3] is also tested.

The comprehensive assessment validates that the proposed method is more robust than most of the compared approaches in Subsect. 3.1, which achieves comparative (N, M, P) or better (T, S) performance on the whole set of 100 real images. In the meanwhile, it is found that our approach performs much comparatively to [3] and [38] on the five categories of blurred images, particularly on those text and saturated ones.

Considering the restricted paper space, we just take three challenging images for example. Figures 3, 4, and 5 provide the deblurred results for three blurred images for visual perception. It is seen that the proposed method, [3, 8] can produce plausible kernels in most cases. Most of the kernels are with tiny differences, which naturally lead to visually similar and acceptable deblur images. Nevertheless, other approaches just get occasional success on those challenging experiments.

Fig. 3.
figure 3

Results for the manmade (M) image ‘postcard’ corresponding to the methods [3, 7, 8, 19] and ours with reasonable kernels produced.

Fig. 4.
figure 4

Results for the text (T) image ‘text2’ corresponding to the methods [3, 7, 8, 16] and Ours with reasonable kernels produced.

Fig. 5.
figure 5

Results for the saturated (S) image ‘car5’ corresponding to the methods [3, 8] and ours with reasonable kernels produced.

In spite of similar deblurring performance among [3, 8] and our approach on the above blurred images, as for two saturated images ‘garden’ and ‘sydney_opera’, both [3, 8] have completely failed to a great degree. However, our method succeeds to recover a very plausible blur kernel for each image. Figure 6 provides the deblurring results for the three methods. Meanwhile, we observe several examples including: ‘car4’, ‘night1’, ‘night4’, ‘notredame’, ‘text1’, and ‘text12’, where either [8] or [3] or occasionally both generate much less accurate kernels than those of our approach.

Fig. 6.
figure 6

Visual comparison among [3, 8], and the proposed approach on the two saturated (S) images ‘garden’ and ‘sydney_opera’ where both [3, 8] have completely failed to a great degree while the proposed approach succeeds in producing very reasonable kernels which naturally lead to visually acceptable deblurred images.

4 Conclusion

Blind image deblurring, as a fundamental low-level vision problem, is far from being solved due to the challenging blur process in practical imaging, e.g., Gaussian-shaped kernels of varying sizes, ellipse-shaped kernels of varying orientations, curvilinear kernels of varying trajectories. In distinction to the previous methods, this paper is inspired by a rule of work from Albert Einstein: Out of clutter find simplicity, aiming to exploit the full potential of gradient-based approaches with the new proposal of a simple, robust yet discriminative prior for nonparametric blur kernel estimation. Our new discriminative approach achieves decent performance on both synthetic and realistic blurry images, and could be served as a new start point to develop more reliable, robust, effective and efficient blind image deblurring approaches.