Skip to main content
Log in

Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Nonblind image deconvolution (NID) is about restoring the latent image with sharp details from a noisy blurred one using a known blur kernel. This paper presents a dataset-free deep learning approach for NID using untrained deep neural networks (DNNs), which does not require any external training data with ground-truth images. Based on a spatially-adaptive dropout scheme, the proposed approach learns a DNN with model uncertainty from the input blurred image, and the deconvolution result is obtained by aggregating the multiple predictions from the learned dropout DNN. It is shown that the solution approximates a minimum-mean-squared-error estimator in Bayesian inference. In addition, a self-supervised loss function for training is presented to efficiently handle the noise in blurred images. Extensive experiments show that the proposed approach not only performs noticeably better than existing non-learning-based methods and unsupervised learning-based methods, but also performs competitively against recent supervised learning-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. The multi-scale SSIM is used in the benchmark of Kohler et al.’s dataset. For simplicity, we also call it SSIM in the tables.

References

  • Anger, J., Facciolo, G., & Delbracio, M. (2018). Modeling realistic degradations in non-blind deconvolution. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 978–982.

  • Anger, J., Delbracio, M., & Facciolo, G. (2019a). Efficient blind deblurring under high noise levels. In: International symposium on image and signal processing and analysis, IEEE, pp 123–128.

  • Anger, J., Facciolo, G., & Delbracio, M. (2019). Blind image deblurring using the l0 gradient prior. Image Processing on Line, 9, 124–142.

    Article  Google Scholar 

  • Arridge, S., Maass, P., Öktem, O., & Schönlieb, C. B. (2019). Solving inverse problems using data-driven models. Acta Numerica, 28, 1–174.

    Article  MathSciNet  Google Scholar 

  • Azzari, L., & Foi, A. (2016). Variance stabilization for noisy+estimate combination in iterative poisson denoising. IEEE Signal Processing Letters, 23(8), 1086–1090.

    Article  Google Scholar 

  • Batson, J., & Royer, L. (2019). Noise2self: Blind denoising by self-supervision. In: Proceedings of the international conference on machine learning

  • Bigdeli, SA., Zwicker, M., Favaro, P., & Jin, M. (2017). Deep mean-shift priors for image restoration. In: Proceedings of the international conference on neural information processing systems, pp 763–772.

  • Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.

    Article  MathSciNet  Google Scholar 

  • Chen, G., Zhu, F., & Ann Heng, P. (2015). An efficient statistical method for image noise level estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 477–485.

  • Cho, SJ., Ji, SW., Hong, JP., Jung, SW., & Ko, SJ. (2021). Rethinking coarse-to-fine approach in single image deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4641–4650.

  • Danielyan, A., Katkovnik, V., & Egiazarian, K. (2011). BM3D frames and variational image deblurring. IEEE Transactions on Image Processing, 21(4), 1715–1728.

    Article  MathSciNet  Google Scholar 

  • Dong, J., Pan, J., Sun, D., Su, Z., Yang, & MH. (2018). Learning data terms for non-blind deblurring. In: Proceedings of the European conference on computer vision, pp 748–763.

  • Dong, J., Roth, S., & Schiele, B. (2021). Deep Wiener deconvolution: Wiener meets deep learning for image deblurring. In: Proceedings of the international conference on neural information processing systems 33.

  • Dong, W., Wang, P., Yin, W., Shi, G., Wu, F., & Lu, X. (2019). Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10), 2305–2318.

    Article  Google Scholar 

  • Eboli, T., Sun, J., & Ponce, J. (2020). End-to-end interpretable learning of non-blind image deblurring. In: Proceedings of the European conference on computer vision.

  • Ehret, T., Davy, A., Morel, JM., Facciolo, G., & Arias, P. (2019). Model-blind video denoising via frame-to-frame training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11,369–11,378.

  • Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of the international conference on machine learning, pp 1050–1059.

  • Gilton, D., Ongie, G., & Willett, R. (2020). Neumann networks for linear inverse problems in imaging. IEEE Transactions on Computational Imaging, 6, 328–343.

    Article  MathSciNet  Google Scholar 

  • Gong, D., Zhang, Z., Shi, Q., van den Hengel, A., Shen, C., & Zhang, Y. (2020). Learning deep gradient descent optimization for image deconvolution. IEEE transactions on neural networks and learning systems, 1–15.

  • Heckel, R. (2019). Regularizing linear inverse problems with convolutional neural networks. arXiv preprint arXiv:1907.03100v1.

  • Heckel, R., & Hand, P. (2019). Deep decoder: Concise image representations from untrained non-convolutional networks. In: Proceedings of the international conference on learning representations.

  • Hendriksen, A., Pelt, D. M., & Batenburg, K. J. (2020). Noise2inverse: Self-supervised deep convolutional denoising for linear inverse problems in imaging. IEEE Transactions on Computational Imaging, 6, 1320–1335.

    Article  MathSciNet  Google Scholar 

  • Jin, M., Roth, S., & Favaro, P. (2017). Noise-blind image deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, pp 3834–3842.

  • Kaufman, A., & Fattal, R. (2020). Deblurring using analysis-synthesis networks pair. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5811–5820.

  • Köhler, R., Hirsch, M., Mohler, B., Schölkopf, B., & Harmeling, S. (2012). Recording and playback of camera shake: Benchmarking blind deconvolution with a real-world database. In: Proceedings of the European conference on computer vision, Springer, pp 27–40.

  • Krishnan, D., & Fergus, R. (2009). Fast image deconvolution using hyper-laplacian priors. In: Proceedings of the international conference on neural information processing systems, pp 1033–1041.

  • Krull, A., Buchholz, TO., & Jug, F. (2019). Noise2void-learning denoising from single noisy images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2129–2137.

  • Kruse, J., Rother, C., & Schmidt, U. (2017). Learning to push the limits of efficient FFT-based image deconvolution. In: Proceedings of the IEEE/CVF international conference on computer vision, IEEE, pp 4586–4594.

  • Lai, WS., Huang, JB., Hu, Z., Ahuja, N., & Yang, MH. (2016). A comparative study for single image blind deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1701–1709.

  • Laine, S., Lehtinen, J., & Aila, T. (2019). High-quality self-supervised deep image denoising. In: Proceedings of the international conference on neural information processing systems

  • Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., & Aila, T. (2018). Noise2noise: Learning image restoration without clean data. In: Proceedings of the international conference on machine learning.

  • Levin, A., Weiss, Y., Durand, F., & Freeman, WT. (2011). Efficient marginal likelihood optimization in blind deconvolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, pp 2657–2664.

  • Li, J., Shen, Z., Yin, R., & Zhang, X. (2015). A reweighted \(l\)2 method for image restoration with poisson and mixed poisson-gaussian noise. Inverse Problem and Imaging, 9(3), 875–894.

    Article  Google Scholar 

  • Meinhardt, T., Möller, M., Hazirbas, C., & Cremers, D. (2017). Learning proximal operators: Using denoising networks for regularizing inverse imaging problems. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1799–1808.

  • Nan, Y., & Ji, H. (2020). Deep learning for handling kernel/model uncertainty in image deconvolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2388–2397.

  • Nan, Y., Quan, Y., & Ji, H. (2020). Variational-EM-based deep learning for noise-blind image deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3626–3635.

  • Pan, J., Hu, Z., Su, Z., & Yang, MH. (2014). Deblurring text images via L0-regularized intensity and gradient prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2901–2908.

  • Pan, J., Sun, D., Pfister, H., & Yang, MH. (2016). Blind image deblurring using dark channel prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1628–1636.

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Pytorch document for maxpool2d. https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.

  • Quan, Y., Chen, M., Pang, T., & Ji, H. (2020). Self2self with dropout: Learning self-supervised denoising from single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1890–1898.

  • Ren, D., Zhang, K., Wang, Q., Hu, Q., & Zuo, W. (2020). Neural blind deconvolution using deep priors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3341–3350.

  • Ren, W., Zhang, J., Ma, L., Pan, J., Cao, X., Zuo, W., Liu, W., & Yang, MH. (2018). Deep non-blind deconvolution via generalized low-rank approximation. In: Proceedings of the international conference on neural information processing systems, pp 295–305.

  • Romano, Y., Elad, M., & Milanfar, P. (2017). The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences, 10(4), 1804–1844.

    Article  MathSciNet  Google Scholar 

  • Schmidt, U., & Roth, S. (2014). Shrinkage fields for effective image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2774–2781.

  • Schmidt, U., Schelten, K., & Roth, S. (2011). Bayesian deblurring with integrated noise estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2625–2632.

  • Schuler, CJ., Christopher Burger, H., Harmeling, S., & Scholkopf, B. (2013). A machine learning approach for non-blind image deconvolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1067–1074.

  • Soltanayev, S., & Chun, SY. (2018). Training deep learning based denoisers without ground truth data. In: Proceedings of the international conference on neural information processing systems, pp 3257–3267.

  • Son, H., & Lee, S. (2017). Fast non-blind deconvolution via regularized residual networks with long/short skip-connections. In: IEEE international conference on computational photography, IEEE, pp 1–10.

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Sun, L., Cho, S., Wang, J., & Hays, J. (2013). Edge-based blur kernel estimation using patch priors. In: IEEE international conference on computational photography, IEEE, pp 1–8.

  • Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2018). Deep image prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9446–9454.

  • Vasu, S., Maligireddy, VR., & Rajagopalan, AN. (2018). Non-blind deblurring: Handling kernel uncertainty with CNNs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, pp 3272–3281.

  • Vedaldi, A., Lempitsky, V., & Ulyanov, D. (2020). Deep image prior. International Journal of Computer Vision, 128(7), 1867–1888.

    Article  Google Scholar 

  • Vonesch, C., & Unser, M. (2008). A fast thresholded landweber algorithm for wavelet-regularized multidimensional deconvolution. IEEE Transactions on Image Processing, 17, 539–549.

    Article  MathSciNet  Google Scholar 

  • Wang, Z., Wang, Z., Li, Q., & Bilen, H. (2019). Image deconvolution with deep image and kernel priors. Proceedings of the IEEE/CVF international conference on computer vision workshop.

  • Xu, L., Ren, JSJ., Liu, C., & Jia, J. (2014). Deep convolutional neural network for image deconvolution. In: Proceedings of the international conference on neural information processing systems, pp 1790–1798.

  • Yang, L., & Ji, H. (2019). A variational EM framework with adaptive edge selection for blind motion deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,159–10,168.

  • Zhang, J., s Pan, J., Lai, WS., Lau, RWH., & Yang, MH. (2017a). Learning fully convolutional networks for iterative non-blind deconvolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, pp 6969–6977.

  • Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing, 26(7), 3142–3155.

    Article  MathSciNet  Google Scholar 

  • Zhang, K., Zuo, W., Gu, S., & Zhang, L. (2017c). Learning deep CNN denoiser prior for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3929–3938.

  • Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. In: Proceedings of the IEEE/CVF international conference on computer vision, IEEE, pp 479–486.

  • Zukerman, J., Tirer, T., & Giryes, R. (2020). BP-DIP: A backprojection based deep image prior. In: Proceedings of the European conference on computer vision workshop.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhui Quan.

Additional information

Communicated by Jean-Michel Morel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported in part by National Natural Science Foundation of China under Grant 61872151, in part by CCF-Tencent Open Fund 2020, and in part by MOE AcRF Tier 1 Research Grant R146000315114.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 168551 KB)

Appendices

Proof of Proposition 1

First, we rewrite the loss function as follows.

$$\begin{aligned} \begin{aligned}&\sum _{\ell } \left\| {{\varvec{k}}}* f_{{{\varvec{\beta }}}}({\widehat{{{\varvec{y}}}}}_\ell ) -{{\varvec{y}}}\right\| _{{{\varvec{m}}}_\ell }^2 \\&\quad =\sum _{\ell }\left\| {{\varvec{k}}}* f_{{{\varvec{\beta }}}}({\widehat{{{\varvec{y}}}}}_\ell )-{{\varvec{k}}}*{{\varvec{x}}}\right\| _{{{\varvec{m}}}_\ell }^2 + \sum _{\ell }\left\| {{\varvec{n}}}\right\| _{{{\varvec{m}}}_\ell }^2 \\&\qquad - 2{{\varvec{n}}}^{\top }\big (\sum _{\ell } ({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot ({{\varvec{k}}}* f_{{{\varvec{\beta }}}}({\widehat{{{\varvec{y}}}}}_\ell )-{{\varvec{k}}}*{{\varvec{x}}})\big ). \end{aligned} \end{aligned}$$
(16)

The expectation of the second term is given by

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{{\varvec{n}}}}\big [\sum _{\ell }\left\| {{\varvec{n}}}\right\| _{{{\varvec{m}}}_\ell }^2\big ]= {\mathbb {E}}_{{{\varvec{n}}}}\big [\sum _{\ell }\left\| ({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot {{\varvec{n}}}\right\| _2^2\big ] \\&\quad =\sum _{\ell } \left\| ({{\varvec{1}}}-{{\varvec{m}}}_\ell ) \odot {{\varvec{\sigma }}}\right\| _2^2= \sum _{\ell } \left\| {{\varvec{\sigma }}}\right\| _{{{\varvec{m}}}_\ell }^2. \end{aligned} \end{aligned}$$
(17)

Regarding the last term, for simplicity we define

$$\begin{aligned} \begin{aligned} {{\varvec{r}}}&=\sum _{\ell }({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot ({{\varvec{k}}}* f_{{{\varvec{\beta }}}}({\widehat{{{\varvec{y}}}}}_\ell )-{{\varvec{k}}}*{{\varvec{x}}})\\&=\sum _{\ell }({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot ({{\varvec{k}}}* f_{{{\varvec{\beta }}}}({{\varvec{m}}}_\ell \odot ({{\varvec{k}}}*{{\varvec{x}}})+{{\varvec{m}}}_\ell \odot {{\varvec{n}}}\\&\qquad +({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot (\mathcal {A}\circ ({{\varvec{m}}}_\ell \odot {{\varvec{y}}})))-{{\varvec{k}}}*{{\varvec{x}}}). \end{aligned} \end{aligned}$$
(18)

It can be seen that \({{\varvec{k}}}* f_{{{\varvec{\beta }}}}({{\varvec{m}}}_\ell \odot ({{\varvec{k}}}*{{\varvec{x}}})+{{\varvec{m}}}_\ell \odot {{\varvec{n}}}+({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot (\mathcal {A}\circ ({{\varvec{m}}}_\ell \odot {{\varvec{y}}})))\) contributes to \({{\varvec{r}}}(i)\) only if \({{\varvec{m}}}_\ell (i)=0\). But in this case, \({{\varvec{n}}}(i)\) is erased by \({{\varvec{m}}}_\ell (i)\). This means that \({{\varvec{n}}}(i)\) has no contribution to \({{\varvec{r}}}(i)\). Together with that \({{\varvec{n}}}(i)\) is independent of \({{\varvec{n}}}(j)\) for any \(i\ne j\), It is concluded that \({{\varvec{r}}}(i)\) is independent to \({{\varvec{n}}}(i)\) for all i. Therefore, we have

$$\begin{aligned} {\mathbb {E}}_{{{\varvec{n}}}} \big [{{\varvec{n}}}^{\top }{{\varvec{r}}}\big ] = ({\mathbb {E}}_{{{\varvec{n}}}} \big [{{\varvec{n}}}\big ])^\top ({\mathbb {E}}_{{{\varvec{n}}}} \big [{{\varvec{r}}}\big ]) = 0. \end{aligned}$$
(19)

Combining (16), (17) and (19) gives that

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{{\varvec{n}}}}[\sum _{\ell } \Vert (1-{{\varvec{m}}}_\ell )\odot ({{\varvec{k}}}* f_{{{\varvec{\beta }}}}({\widehat{{{\varvec{y}}}}}_\ell )-{{\varvec{y}}})\Vert _2^2]\\&\quad =\sum _{\ell }\left\| {{\varvec{k}}}* f_{{{\varvec{\beta }}}}({\widehat{{{\varvec{y}}}}})-{{\varvec{k}}}*{{\varvec{x}}}\right\| _{{{\varvec{m}}}_\ell }^2 + \sum _{\ell } \left\| {{\varvec{\sigma }}}\right\| _{{{\varvec{m}}}_\ell }^2. \end{aligned} \end{aligned}$$
(20)

The proof is done.

MMSE Approximation

The derivation is based on the theory of variational inference; please see (Blei et al. 2017) for a comprehensive introduction. First, we take \(p({{\varvec{\beta }}})\) as a uniform distribution on a sufficiently large bounded set \(\mathbb {S}=[-C/2,C/2]^d\), where C is a sufficient large positive number and d is the dimensionality of \({{\varvec{\beta }}}\). Additionally, the entries of the noise \({{\varvec{n}}}\) are assumed to be i.i.d. variables following the Gaussian distribution of mean zero and variance \(\sigma ^2\). The KL divergence between \(p({{\varvec{\beta }}}\vert \mathcal {D})\) and \(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\) is given by

$$\begin{aligned} \begin{aligned}&\mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}}\vert \mathcal {D})) \\&\quad =\mathbb {E}_{ q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})-\mathbb {E}_{q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log p({{\varvec{\beta }}}\vert \mathcal {D})\\&\quad =\mathbb {E}_{ q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})-\mathbb {E}_{q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \big ( \log p({{\varvec{\beta }}})\\&\qquad +\log p(\mathcal {D}\vert {{\varvec{\beta }}})-\log p(\mathcal {D}) \big )\\&\quad = \mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}})) - {\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})}\log p(\mathcal {D}\vert {{\varvec{\beta }}})\\&\qquad + {\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})}\log p(\mathcal {D}). \end{aligned} \end{aligned}$$
(21)

Since \(\log p(\mathcal {D})\) is irrelevant to the model parameter \({{\varvec{\beta }}}\), we have \({\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})}\log p(\mathcal {D}) = \log p(\mathcal {D})\) which is a constant. Therefore, we have

$$\begin{aligned} \begin{aligned}&\mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}}\vert \mathcal {D})) \\&\quad = \mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}})) \\&\qquad - {\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})}\log p(\mathcal {D}\vert {{\varvec{\beta }}}) + \text{ const }., \end{aligned} \end{aligned}$$
(22)

where the KL divergence between \(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\) and \(p({{\varvec{\beta }}})\) can be written as

$$\begin{aligned} \begin{aligned}&\mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}})) \\&\quad = \mathbb {E}_{q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})-\mathbb {E}_{q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log p({{\varvec{\beta }}}). \end{aligned} \end{aligned}$$
(23)

The first term on the right hand of the above equation is the negative entropy of \(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\). Recall that \(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\) is the probability of Bernoulli:

$$\begin{aligned} {{\varvec{\beta }}}= {{\varvec{\theta }}}\odot {{\varvec{b}}}, \text{ where } \ {{\varvec{b}}}(i) \sim \mathrm{Bernoulli}(p_i). \end{aligned}$$
(24)

Its entropy is given by

$$\begin{aligned} \begin{aligned}&- \mathbb {E}_{q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log q({{\varvec{\beta }}}\vert {{\varvec{\theta }}}) \\&\quad =-\prod _i \big ( p_i\log p_i+ (1-p_i)\log (1-p_i) \big ), \end{aligned} \end{aligned}$$
(25)

which is a constant irrelevant to the parameter \({{\varvec{\theta }}}\) and can be ignored during training. Since \(p({{\varvec{\beta }}})=0\) outside \(\mathbb {S}\), \(\log p({{\varvec{\beta }}}) = -\infty \) outside \(\mathbb {S}\). Then \(\int q({{\varvec{\beta }}}\vert {{\varvec{\theta }}}) \log p({{\varvec{\beta }}})=-\infty \) if \(\int _{\mathbb {R}^d\setminus \mathbb {S}} q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\ne 0\) (which means that \({{\varvec{\theta }}}\notin \mathbb {S}\)). When \({{\varvec{\theta }}}\in \mathbb {S}\), we have \(\int _{\mathbb {S}}q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})=1 \) and thus

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \log p({{\varvec{\beta }}}) \\&\quad = \int q({{\varvec{\beta }}}\vert {{\varvec{\theta }}}) \log p({{\varvec{\beta }}}) = \int _{\mathbb {S}} q({{\varvec{\beta }}}\vert {{\varvec{\theta }}}) \log \frac{1}{C^d} \\&\quad = \log \frac{1}{C^d}. \end{aligned} \end{aligned}$$
(26)

It yields that

$$\begin{aligned} \int q({{\varvec{\beta }}}\vert {{\varvec{\theta }}}) \log p({{\varvec{\beta }}}) =\left\{ \begin{array}{ll} \log \frac{1}{C^d}, &{}{{\varvec{\theta }}}\in \mathbb {S}\\ -\infty , &{}{{\varvec{\theta }}}\notin \mathbb {S}. \end{array}\right. \end{aligned}$$
(27)

Finally, we obtained that

$$\begin{aligned} \mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}})) = \delta _{\mathbb {S}}({{\varvec{\theta }}}) +\text{ const }, \end{aligned}$$
(28)

where

$$\begin{aligned} \delta _{\mathbb {S}}({{\varvec{\theta }}}) = \left\{ \begin{array}{ll} 0, &{}\text{ if } {{\varvec{\theta }}}\in \mathbb {S},\\ +\infty , &{}\text{ otherwise }. \end{array}\right. \end{aligned}$$
(29)

Recall that the samples in \(\mathcal {D}=\{{\widehat{{{\varvec{y}}}}}_\ell ,({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot {{\varvec{y}}}\}_\ell \) are related by

$$\begin{aligned} ({\varvec{1}}-m_{\ell }) \odot {{\varvec{y}}}= ({\varvec{1}}-m_{\ell }) \odot ({{\varvec{k}}}* f_{{{\varvec{\beta }}}}(\widehat{{{\varvec{y}}}_{\ell }}) + {{\varvec{n}}}). \end{aligned}$$
(30)

Roughly, we assume these samples are independent from each other. Then we can obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})}\log p(\mathcal {D}\vert {{\varvec{\beta }}}) \\&\quad = {\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \sum _{{\widehat{{{\varvec{y}}}}}_\ell \sim \varvec{\Omega }} \log p(({{\varvec{1}}}-{{\varvec{m}}}_\ell )\odot {{\varvec{y}}}\vert {{\varvec{\beta }}},{\widehat{{{\varvec{y}}}}}_\ell ) \\&\qquad +{\mathbb {E}}_{{{\varvec{\beta }}}\sim q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})} \sum _{{\widehat{{{\varvec{y}}}}}_\ell \sim \varvec{\Omega }} \log p({\widehat{{{\varvec{y}}}}}_\ell \vert {{\varvec{\beta }}}) \\&\quad = -\frac{1}{2\sigma ^2}\mathbb {E}_{{{\varvec{b}}}}\sum _{{\widehat{{{\varvec{y}}}}}\sim \varvec{\Omega }}\ {{\mathcal {C}}}({{\varvec{y}}},{{\varvec{k}}}* f_{{{\varvec{\theta }}}\odot {{\varvec{b}}}}({{\widehat{{{\varvec{y}}}}}})) + \text{ const. } \end{aligned} \end{aligned}$$
(31)

Finally, we have

$$\begin{aligned} \begin{aligned}&\mathrm{KL}(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\Vert p({{\varvec{\beta }}}\vert \mathcal {D})) \\&\quad = \frac{1}{2\sigma ^2}\mathbb {E}_{{{\varvec{b}}}}\sum _{{\widehat{{{\varvec{y}}}}}\sim \varvec{\Omega }}\ {{\mathcal {C}}}({{\varvec{y}}},{{\varvec{k}}}* f_{{{\varvec{\theta }}}\odot {{\varvec{b}}}}({{\widehat{{{\varvec{y}}}}}})) + \delta _{\mathbb {S}}({{\varvec{\theta }}}) + \text{ const. } \end{aligned} \end{aligned}$$
(32)

Thus, minimizing the KL divergence between \(p({{\varvec{\beta }}}\vert \mathcal {D})\) and \(q({{\varvec{\beta }}}\vert {{\varvec{\theta }}})\) is equivalent to

$$\begin{aligned} \min _{{{\varvec{\theta }}}\in \mathbb {S}} \mathbb {E}_{{\widehat{{{\varvec{y}}}}}\sim \varvec{\Omega }}\mathbb {E}_{{{\varvec{b}}}}\ {{\mathcal {C}}}({{\varvec{y}}},{{\varvec{k}}}* f_{{{\varvec{\theta }}}\odot {{\varvec{b}}}}({{\widehat{{{\varvec{y}}}}}})). \end{aligned}$$
(33)

Since the feasible set \(\mathbb {S}\) is sufficiently large, the constraint \({{\varvec{\theta }}}\in \mathbb {S}\) can be omitted in practice, which results in the our training loss in (5).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, M., Quan, Y., Pang, T. et al. Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network. Int J Comput Vis 130, 1770–1789 (2022). https://doi.org/10.1007/s11263-022-01621-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01621-9

Keywords

Navigation