Skip to main content
Log in

A CVAE-within-Gibbs sampler for Bayesian linear inverse problems with hyperparameters

  • Published:
Computational and Applied Mathematics Aims and scope Submit manuscript

Abstract

We propose a conditional variational auto-encoder within Gibbs sampling (CVAE-within-Gibbs) for Bayesian linear inverse problems where the prior or the likelihood function depends on ambiguous hyperparameters. The method builds on ideas from classical sampling theory and recent advances in deep generative models to approximate complicated probability distributions. Specifically, we use a CVAE model which is trained with a large amount of data to learn the conditional density of hyperparameters in the original Gibbs sampler. The learned property of the conditional posterior provides more flexibility than classical Gibbs sampling because it avoids manually or experimentally determining the hyperpriors and their hyperparameters. We demonstrate the performance of the proposed method for three linear inverse problems, i.e., image deblurring, signal denoising, and boundary heat flux identification in a heat conduction problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Agrawal S, Kim H, Sanz-Alonso D et al (2022) A variational inference approach to inverse problems with gamma hyperpriors. SIAM/ASA J Uncertain Quantif 10(4):1533–1559

    Article  MathSciNet  MATH  Google Scholar 

  • Banham MR, Katsaggelos AK (1997) Digital image restoration. IEEE Signal Process Mag 14(2):24–41

    Article  Google Scholar 

  • Bardsley JM, Cui T (2019) A metropolis-hastings-within-Gibbs sampler for nonlinear hierarchical-Bayesian inverse problems. In: 2017 MATRIX Annals. Springer, pp 3–12

  • Calvetti D, Pragliola M, Somersalo E et al (2020) Sparse reconstructions from few noisy data: analysis of hierarchical Bayesian models with generalized gamma hyperpriors. Inverse Prob 36(2):025010

    Article  MathSciNet  MATH  Google Scholar 

  • Carriquiry AL, Pawlovich M et al (2004) From empirical Bayes to full Bayes: methods for analyzing traffic safety data

  • Casella G (1985) An introduction to empirical Bayes data analysis. Am Stat 39(2):83–87

    MathSciNet  Google Scholar 

  • Cotter SL, Roberts GO, Stuart AM et al (2013) MCMC methods for functions: modifying old algorithms to make them faster. Stat Sci 28(3):424–446

    Article  MathSciNet  MATH  Google Scholar 

  • Donatelli M, Ferrari P, Gazzola S (2022) Symmetrization techniques in image deblurring. arXiv preprint arXiv:2212.05879

  • Dunlop MM, Iglesias MA, Stuart AM (2017) Hierarchical Bayesian level set inversion. Stat Comput 27(6):1555–1584

    Article  MathSciNet  MATH  Google Scholar 

  • Fox C, Norton RA (2016) Fast sampling in a linear-Gaussian inverse problem. SIAM/ASA J Uncertain Quantif 4(1):1191–1218

    Article  MathSciNet  MATH  Google Scholar 

  • Gamerman D, Lopes HF (2006) Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Guha N, Wu X, Efendiev Y et al (2015) A variational bayesian approach for inverse problems with skew-t error distributions. J Comput Phys 301:377–393

    Article  MathSciNet  MATH  Google Scholar 

  • Guo B, Han Y, Wen J (2019) Agem: Solving linear inverse problems via deep priors and sampling. Adv Neural Inf Process Syst 32:547–558

  • Guo L, Zhao XL, Gu XM et al (2021) Three-dimensional fractional total variation regularized tensor optimized model for image deblurring. Appl Math Comput 404(126):224

    MathSciNet  MATH  Google Scholar 

  • Jin B (2012) A variational Bayesian method to inverse problems with impulsive noise. J Comput Phys 231(2):423–435

    Article  MathSciNet  MATH  Google Scholar 

  • Jin B, Zou J (2010) Hierarchical Bayesian inference for ill-posed problems via variational method. J Comput Phys 229(19):7317–7343

    Article  MathSciNet  MATH  Google Scholar 

  • Kaipio J, Somersalo E (2006) Statistical and computational inverse problems, vol 160. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: Bengio Y, LeCun Y (eds) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Conference Track Proceedings. arXiv:1312.6114

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  • Liu JS (1996) Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat Comput 6(2):113–119

    Article  Google Scholar 

  • Liu Q, Tong XT (2020) Accelerating metropolis-within-Gibbs sampler with localized computations of differential equations. Stat Comput 30(4):1037–1056

    Article  MathSciNet  MATH  Google Scholar 

  • Liu J, Sun Y, Xu X et al (2019) Image restoration using total variation regularized deep image prior. ICASSP 2019–2019 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 7715–7719

  • Ma Y, Tan J, Krishnan N et al (2014) Empirical Bayes and full Bayes for signal estimation. arXiv preprint arXiv:1405.2113

  • Minkowycz W, Sparrow EM, Schneider GE et al (1988) Handbook of numerical heat transfer

  • Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

  • Paszke A, Gross S, Massa F, et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037

  • Plassier V, Vono M, Durmus A et al (2021) DG-LMC: a turn-key and scalable synchronous distributed MCMC algorithm via Langevin Monte Carlo within Gibbs. In: International Conference on Machine Learning, PMLR, pp 8577–8587

  • Rao AM, Jones DL (2000) A denoising approach to multisensor signal estimation. IEEE Trans Signal Process 48(5):1225–1234

    Article  Google Scholar 

  • Rudin LI, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Phys D Nonlinear Phenom 60(1–4):259–268

    Article  MathSciNet  MATH  Google Scholar 

  • Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv Neural Inf Process Syst 28:3483–3491

  • Stuart AM (2010) Inverse problems: a Bayesian perspective. Acta Numer 19:451–559

    Article  MathSciNet  MATH  Google Scholar 

  • Su X, Zamzami N, Bouguila N (2022) A fully Bayesian inference with Gibbs sampling for finite and infinite discrete exponential mixture models. Appl Artif Intell 36(1):1–28

  • Tiwari KA, Raisutis R, Samaitis V (2017) Signal processing methods to improve the signal-to-noise ratio (SNR) in ultrasonic non-destructive testing of wind turbine blade. Procedia Struct Integr 5:1184–1191

    Article  Google Scholar 

  • Tong XT, Morzfeld M, Marzouk YM (2020) MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure. SIAM J Sci Comput 42(3):A1765–A1788

    Article  MathSciNet  MATH  Google Scholar 

  • Uribe F, Bardsley JM, Dong Y, et al (2021) A hybrid Gibbs sampler for edge-preserving tomographic reconstruction with uncertain view angles. arXiv preprint arXiv:2104.06919

  • Wang J, Zabaras N (2004) A Bayesian inference approach to the inverse heat conduction problem. Int J Heat Mass Transf 47(17–18):3927–3941

    Article  MATH  Google Scholar 

  • Wang R, Tao D (2014) Recent progress in image deblurring. arXiv preprint arXiv:1409.6838

  • Winkler C, Worrall D, Hoogeboom E et al (2019) Learning likelihoods with conditional normalizing flows. arXiv preprint arXiv:1912.00042

  • Xie J, Colonna JG, Zhang J (2021) Bioacoustic signal denoising: a review. Artif Intell Rev 54(5):3575–3597

    Article  Google Scholar 

  • Zhang C, Arridge S, Jin B (2019) Expectation propagation for Poisson data. Inverse Probl 35(8):085006

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou Q, Liu W, Li J et al (2018) An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems. Inverse Probl 34(9):095001

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu X, Milanfar P (2010) Image reconstruction from videos distorted by atmospheric turbulence. In: Visual information processing and communication, SPIE, pp 228–235

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China under Grant 12101614, the Natural Science Foundation of Hunan Province, China, under Grant 2021JJ40715 and the Postgraduate Scientific Research Innovation Project of Hunan Province, China (CX20220288). We are grateful to the High Performance Computing Center of Central South University for assistance with the computations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingping Zhou.

Ethics declarations

Conflict of interest

This work does not have any conflicts of interest.

Data availability

The datasets and codes are available in the https://github.com/YangJingya27/CVAE-within-Gibbs.

Additional information

Communicated by Vinicius Albani.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Datasets

The details of datasets for the three experiments are summarized in Table 8. The unknown parameters \({\varvec{u}}\), the measurable data \({\varvec{y}}\), and the hyperparameters \(\varvec{\theta }\) are included in these datasets. In the image deblurring experiment, the variance of the prior distribution contains the hyperparameters \(\gamma \) and d, as shown in Eq. (11), and the mean of the prior distribution contains \(\mu \), and s is included in the variance of the noise term. In the signal denoising experiment and the IHCP experiment, the only hyperparameter \(\sigma _\mathrm{{obs}}\) is drawn from the variance of the noise term \(\varvec{\Sigma }_\mathrm{{obs}}=\sigma ^2_\mathrm{{obs}}{\textbf{I}}\). The intervals in the second column of Table 8 indicate the range of values of hyperparameters determined empirically. In order to generate dataset, P discrete points are uniformly selected from each interval to form different combinations of hyperparameters values, and corresponding \(\{y^i,u^i\},i=1,\ldots ,{\bar{N}}\) are generated based on each combination. The last three columns of the table represent the size of the synthetic datasets, the dimension of the attribute \({\varvec{u}}\), and the dimension of \({\varvec{y}}\). For example, in the first experiment, there are a total of \(200,000=50\times 20\times 20\times 10\) combinations of hyperparameters, and 10 samples are randomly generated based on each combination, yielding a total of 2,000,000 samples.

Table 8 Details of datasets

Appendix B: Network architectures

1.1 Image deblurring

The CVAE model used for the image deblurring consists of 2 fully connected hidden layers for the encoder and 3 for the decoder, with 5 Gaussian latent variables. We selected batch size as 128, 10 epochs, and the Adam optimizer with learning rate \(1.7 \times 10^{-6}\). The ReLU activation functions are used in the encoder and decoder. The number of neurons contained in each layer is shown in Table 9.

1.2 Inverse heat conduction problem

In the IHCP experiment, we employed the reclassification strategy discussed in Sect. 5.1.1, i.e., a classification network for \({\varvec{y}}\) is added to CVAE. The CVAE used for IHCP consists of 3 fully connected hidden layers for the encoder, 5 for the re-classification network, and 3 for the decoder, with 5 Gaussian latent variables. We chose Xavier Initialization, batch size of 128, 40 epochs, the Adam optimizer with learning rate \(2\times 10^{-5}\) for the re-classification network and learning rate \(5.5\times 10^{-4}\) for the VAE. The Leaky ReLU activation functions are used in the encoder and decoder and the ReLU activation functions are used in the re-classification network. The number of neurons contained in each layer is shown in Table 9.

1.3 Signal denoising

In the signal denoising experiment, we also employed the reclassification strategy discussed in Sect. 5.1.1. The CVAE model used for signal denoising consists of 5 fully connected hidden layers for the encoder and 7 for the decoder, with 5 Gaussian latent variables. We used a ResNet with 3 residual blocks and a linear layer for the re-classification network. The residual block has 2 fully connected layers with the dropout probability of p = 0.2. We utilize Xavier Initialization and the Leaky ReLU activation functions in all three networks. We chose Xavier Initialization, batch size as 64, 40 epochs, the RMSprop optimizer with learning rate \(1\times 10^{-5}\) for the re-classification network, and the NAdam optimizer with learning rate \(1\times 10^{-4}\) for the CVAE. The number of neurons contained in each layer is shown in Table 9.

Table 9 Details of network architecture

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Niu, Y. & Zhou, Q. A CVAE-within-Gibbs sampler for Bayesian linear inverse problems with hyperparameters. Comp. Appl. Math. 42, 138 (2023). https://doi.org/10.1007/s40314-023-02279-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40314-023-02279-w

Keywords

Mathematics Subject Classification

Navigation