Abstract
We propose a conditional variational auto-encoder within Gibbs sampling (CVAE-within-Gibbs) for Bayesian linear inverse problems where the prior or the likelihood function depends on ambiguous hyperparameters. The method builds on ideas from classical sampling theory and recent advances in deep generative models to approximate complicated probability distributions. Specifically, we use a CVAE model which is trained with a large amount of data to learn the conditional density of hyperparameters in the original Gibbs sampler. The learned property of the conditional posterior provides more flexibility than classical Gibbs sampling because it avoids manually or experimentally determining the hyperpriors and their hyperparameters. We demonstrate the performance of the proposed method for three linear inverse problems, i.e., image deblurring, signal denoising, and boundary heat flux identification in a heat conduction problem.
Similar content being viewed by others
References
Agrawal S, Kim H, Sanz-Alonso D et al (2022) A variational inference approach to inverse problems with gamma hyperpriors. SIAM/ASA J Uncertain Quantif 10(4):1533–1559
Banham MR, Katsaggelos AK (1997) Digital image restoration. IEEE Signal Process Mag 14(2):24–41
Bardsley JM, Cui T (2019) A metropolis-hastings-within-Gibbs sampler for nonlinear hierarchical-Bayesian inverse problems. In: 2017 MATRIX Annals. Springer, pp 3–12
Calvetti D, Pragliola M, Somersalo E et al (2020) Sparse reconstructions from few noisy data: analysis of hierarchical Bayesian models with generalized gamma hyperpriors. Inverse Prob 36(2):025010
Carriquiry AL, Pawlovich M et al (2004) From empirical Bayes to full Bayes: methods for analyzing traffic safety data
Casella G (1985) An introduction to empirical Bayes data analysis. Am Stat 39(2):83–87
Cotter SL, Roberts GO, Stuart AM et al (2013) MCMC methods for functions: modifying old algorithms to make them faster. Stat Sci 28(3):424–446
Donatelli M, Ferrari P, Gazzola S (2022) Symmetrization techniques in image deblurring. arXiv preprint arXiv:2212.05879
Dunlop MM, Iglesias MA, Stuart AM (2017) Hierarchical Bayesian level set inversion. Stat Comput 27(6):1555–1584
Fox C, Norton RA (2016) Fast sampling in a linear-Gaussian inverse problem. SIAM/ASA J Uncertain Quantif 4(1):1191–1218
Gamerman D, Lopes HF (2006) Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC Press, Boca Raton
Guha N, Wu X, Efendiev Y et al (2015) A variational bayesian approach for inverse problems with skew-t error distributions. J Comput Phys 301:377–393
Guo B, Han Y, Wen J (2019) Agem: Solving linear inverse problems via deep priors and sampling. Adv Neural Inf Process Syst 32:547–558
Guo L, Zhao XL, Gu XM et al (2021) Three-dimensional fractional total variation regularized tensor optimized model for image deblurring. Appl Math Comput 404(126):224
Jin B (2012) A variational Bayesian method to inverse problems with impulsive noise. J Comput Phys 231(2):423–435
Jin B, Zou J (2010) Hierarchical Bayesian inference for ill-posed problems via variational method. J Comput Phys 229(19):7317–7343
Kaipio J, Somersalo E (2006) Statistical and computational inverse problems, vol 160. Springer Science & Business Media, Berlin
Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: Bengio Y, LeCun Y (eds) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Conference Track Proceedings. arXiv:1312.6114
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Liu JS (1996) Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat Comput 6(2):113–119
Liu Q, Tong XT (2020) Accelerating metropolis-within-Gibbs sampler with localized computations of differential equations. Stat Comput 30(4):1037–1056
Liu J, Sun Y, Xu X et al (2019) Image restoration using total variation regularized deep image prior. ICASSP 2019–2019 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 7715–7719
Ma Y, Tan J, Krishnan N et al (2014) Empirical Bayes and full Bayes for signal estimation. arXiv preprint arXiv:1405.2113
Minkowycz W, Sparrow EM, Schneider GE et al (1988) Handbook of numerical heat transfer
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784
Paszke A, Gross S, Massa F, et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037
Plassier V, Vono M, Durmus A et al (2021) DG-LMC: a turn-key and scalable synchronous distributed MCMC algorithm via Langevin Monte Carlo within Gibbs. In: International Conference on Machine Learning, PMLR, pp 8577–8587
Rao AM, Jones DL (2000) A denoising approach to multisensor signal estimation. IEEE Trans Signal Process 48(5):1225–1234
Rudin LI, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Phys D Nonlinear Phenom 60(1–4):259–268
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv Neural Inf Process Syst 28:3483–3491
Stuart AM (2010) Inverse problems: a Bayesian perspective. Acta Numer 19:451–559
Su X, Zamzami N, Bouguila N (2022) A fully Bayesian inference with Gibbs sampling for finite and infinite discrete exponential mixture models. Appl Artif Intell 36(1):1–28
Tiwari KA, Raisutis R, Samaitis V (2017) Signal processing methods to improve the signal-to-noise ratio (SNR) in ultrasonic non-destructive testing of wind turbine blade. Procedia Struct Integr 5:1184–1191
Tong XT, Morzfeld M, Marzouk YM (2020) MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure. SIAM J Sci Comput 42(3):A1765–A1788
Uribe F, Bardsley JM, Dong Y, et al (2021) A hybrid Gibbs sampler for edge-preserving tomographic reconstruction with uncertain view angles. arXiv preprint arXiv:2104.06919
Wang J, Zabaras N (2004) A Bayesian inference approach to the inverse heat conduction problem. Int J Heat Mass Transf 47(17–18):3927–3941
Wang R, Tao D (2014) Recent progress in image deblurring. arXiv preprint arXiv:1409.6838
Winkler C, Worrall D, Hoogeboom E et al (2019) Learning likelihoods with conditional normalizing flows. arXiv preprint arXiv:1912.00042
Xie J, Colonna JG, Zhang J (2021) Bioacoustic signal denoising: a review. Artif Intell Rev 54(5):3575–3597
Zhang C, Arridge S, Jin B (2019) Expectation propagation for Poisson data. Inverse Probl 35(8):085006
Zhou Q, Liu W, Li J et al (2018) An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems. Inverse Probl 34(9):095001
Zhu X, Milanfar P (2010) Image reconstruction from videos distorted by atmospheric turbulence. In: Visual information processing and communication, SPIE, pp 228–235
Acknowledgements
The work is supported by the National Natural Science Foundation of China under Grant 12101614, the Natural Science Foundation of Hunan Province, China, under Grant 2021JJ40715 and the Postgraduate Scientific Research Innovation Project of Hunan Province, China (CX20220288). We are grateful to the High Performance Computing Center of Central South University for assistance with the computations.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This work does not have any conflicts of interest.
Data availability
The datasets and codes are available in the https://github.com/YangJingya27/CVAE-within-Gibbs.
Additional information
Communicated by Vinicius Albani.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Datasets
The details of datasets for the three experiments are summarized in Table 8. The unknown parameters \({\varvec{u}}\), the measurable data \({\varvec{y}}\), and the hyperparameters \(\varvec{\theta }\) are included in these datasets. In the image deblurring experiment, the variance of the prior distribution contains the hyperparameters \(\gamma \) and d, as shown in Eq. (11), and the mean of the prior distribution contains \(\mu \), and s is included in the variance of the noise term. In the signal denoising experiment and the IHCP experiment, the only hyperparameter \(\sigma _\mathrm{{obs}}\) is drawn from the variance of the noise term \(\varvec{\Sigma }_\mathrm{{obs}}=\sigma ^2_\mathrm{{obs}}{\textbf{I}}\). The intervals in the second column of Table 8 indicate the range of values of hyperparameters determined empirically. In order to generate dataset, P discrete points are uniformly selected from each interval to form different combinations of hyperparameters values, and corresponding \(\{y^i,u^i\},i=1,\ldots ,{\bar{N}}\) are generated based on each combination. The last three columns of the table represent the size of the synthetic datasets, the dimension of the attribute \({\varvec{u}}\), and the dimension of \({\varvec{y}}\). For example, in the first experiment, there are a total of \(200,000=50\times 20\times 20\times 10\) combinations of hyperparameters, and 10 samples are randomly generated based on each combination, yielding a total of 2,000,000 samples.
Appendix B: Network architectures
1.1 Image deblurring
The CVAE model used for the image deblurring consists of 2 fully connected hidden layers for the encoder and 3 for the decoder, with 5 Gaussian latent variables. We selected batch size as 128, 10 epochs, and the Adam optimizer with learning rate \(1.7 \times 10^{-6}\). The ReLU activation functions are used in the encoder and decoder. The number of neurons contained in each layer is shown in Table 9.
1.2 Inverse heat conduction problem
In the IHCP experiment, we employed the reclassification strategy discussed in Sect. 5.1.1, i.e., a classification network for \({\varvec{y}}\) is added to CVAE. The CVAE used for IHCP consists of 3 fully connected hidden layers for the encoder, 5 for the re-classification network, and 3 for the decoder, with 5 Gaussian latent variables. We chose Xavier Initialization, batch size of 128, 40 epochs, the Adam optimizer with learning rate \(2\times 10^{-5}\) for the re-classification network and learning rate \(5.5\times 10^{-4}\) for the VAE. The Leaky ReLU activation functions are used in the encoder and decoder and the ReLU activation functions are used in the re-classification network. The number of neurons contained in each layer is shown in Table 9.
1.3 Signal denoising
In the signal denoising experiment, we also employed the reclassification strategy discussed in Sect. 5.1.1. The CVAE model used for signal denoising consists of 5 fully connected hidden layers for the encoder and 7 for the decoder, with 5 Gaussian latent variables. We used a ResNet with 3 residual blocks and a linear layer for the re-classification network. The residual block has 2 fully connected layers with the dropout probability of p = 0.2. We utilize Xavier Initialization and the Leaky ReLU activation functions in all three networks. We chose Xavier Initialization, batch size as 64, 40 epochs, the RMSprop optimizer with learning rate \(1\times 10^{-5}\) for the re-classification network, and the NAdam optimizer with learning rate \(1\times 10^{-4}\) for the CVAE. The number of neurons contained in each layer is shown in Table 9.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, J., Niu, Y. & Zhou, Q. A CVAE-within-Gibbs sampler for Bayesian linear inverse problems with hyperparameters. Comp. Appl. Math. 42, 138 (2023). https://doi.org/10.1007/s40314-023-02279-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40314-023-02279-w