Approximate posterior inference for Bayesian models: black-box expectation propagation

Li, Ximing; Li, Changchun; Chi, Jinjin; Ouyang, Jihong

doi:10.1007/s10115-022-01705-5

Approximate posterior inference for Bayesian models: black-box expectation propagation

Regular Paper
Published: 27 July 2022

Volume 64, pages 2361–2387, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ximing Li ORCID: orcid.org/0000-0001-8190-5087^1,2,
Changchun Li^1,2,
Jinjin Chi^1,2 &
…
Jihong Ouyang^1,2

357 Accesses
1 Altmetric
Explore all metrics

Abstract

Expectation propagation (EP) is a widely successful way to approximate the posteriors of complex Bayesian models. However, it suffers from expensive memory and time overheads, since it involves local approximations with locally specific messages. A recent art, namely averaged EP (AEP), upgrades EP by leveraging the average message effect on the posterior distribution, instead of the locally specific ones, so as to simultaneously reduce memory and time costs. In this paper, we extend AEP to a novel black-box expectation propagation (abbr. BBEP) algorithm, which can be directly applied to many Bayesian models without model-specific derivations. We leverage three ideas of black-box learning, leading to three versions of BBEP, referred to as BBEP\(^{{\varvec{m}}}\), BBEP\(^{{\varvec{g}}}\) and BBEP\(^{{\varvec{o}}}\) with Monte Carlo moment matching, Monte Carlo gradients and objective of AEP, respectively. For variance reduction, the importance sampling is used, and the proposal distribution selection as well as high dimensionality setting is discussed. Furthermore, we develop online versions of BBEP for optimization speedup given large-scale data sets. We empirically compare BBEP against the state-of-the-art black-box baseline algorithms on both synthetic and real-world data sets. Experimental results demonstrate that BBEP outperforms the baseline algorithms and it is even on a par with analytical solutions in some settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emerging trends in federated learning: from model fusion to federated X learning

Article Open access 02 April 2024

Shaoxiong Ji, Yue Tan, … Anwar Walid

A modified Adam algorithm for deep neural network optimization

Article Open access 25 April 2023

Mohamed Reyad, Amany M. Sarhan & M. Arafa

Distributionally robust stochastic programs with side information based on trimmings

Article Open access 22 November 2021

Adrián Esteban-Pérez & Juan M. Morales

Notes

In Sect. 2, we will describe EP as well as the cavity distribution in more details.
Actually, the inference algorithms described here can be also applied to more generic Bayesian models. The model family shown in Fig. 1 is only a running example.
In this paper, we assume that the encoding distribution \(p({\varvec{\varepsilon }}|{\varvec{\ddot{\lambda }}})\) is in the same exponential family of the prior distribution \(p_0({\varvec{\theta |\lambda _0}})\), i.e., \(p({\varvec{\varepsilon }}|{\varvec{\ddot{\lambda }}}) \propto \exp (s({\varvec{\varepsilon }})^T {\varvec{\ddot{\lambda }}})\).
http://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

References

Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35
Article MathSciNet Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Burda Y, Grosse R, Salakhutdinov R (2016) Importance weighted autoencoders. In: International conference on learning representations
Cunningham JP, Hennig P, Lacoste-Julien S (2011) Gaussian probabilities and expectation propagation. arXiv:1111.6832
Dehaene G, Barthelmé S (2018) Expectation propagation in the large-data limit. J Roy Stat Soc Ser B Stat Methodol 80(1):199–217
Article MathSciNet Google Scholar
Duchi J, Hazan E, Singer Y (2003) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 11:2121–2159
MathSciNet MATH Google Scholar
Eslami SMA, Tarlow D, Kohli P, Winn J (2014) Just-in-time learning for fast and flexible inference. In: Neural information processing systems, pp 154–162
Giordano R, Broderick T, Jordan M (2015) Linear response methods for accurate covariance estimates from mean field variational Bayes. In: Neural information processing systems, pp 1441–1449
Gu S, Levine S, Sutskever I, Mnih A (2016) Muprop: unbiased backpropagation for stochastic neural networks. In: International conference on learning representations
Heess N, Tarlow D, Winn J (2013) Learning to pass expectation propagation messages. In: Neural information processing systems, pp 3219–3227
Hernandez-Lobato JM, Li Y, Rowland M, Hernandez-Lobato D, Bui TD, Turner RE (2016) Black-box \(\alpha \)-divergence minimization. In: International conference on machine learning pp 1511–1520
Heskes T, Zoeter O (2002) Expectation propagation for approximate inference in dynamic bayesian networks. In: Conference on uncertainty in artificial intelligence, pp 216–223
Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347
MathSciNet MATH Google Scholar
Jitkrittum W, Gretton A, Heess N, Balaji L, Sejdinovic D, Szabo Z (2015) Kernel-based just-in-time learning for passing expectation propagation messages. In: International conference on uncertainty in artificial intelligence, pp 405–414
Jordan MI, Chahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Article Google Scholar
Kingma D, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations
Li, X., Li, C., Chi, J., Ouyang, J.: Variance reduction in black-box variational inference by adaptive importance sampling. In: International Joint Conference on Artificial Intelligence, pp. 2404–2410 (2018)
Li X, Li C, Chi J, Ouyang J, Wang W (2018) Black-box expectation propagation for bayesian models. In: SIAM international conference on data mining, pp 603–611
Li Y, Hernandez-LobatoAndezo JM, Turner RE (2015) Stochastic expectation propagation. In: Neural information processing systems, pp 2323–2331
Lienart T, Teh YW, Doucet A (2015) Expectation particle belief propagation. In: Neural information processing systems, pp 3609–3617
Liu X, He S (2021) Alpha-divergence minimization with mixed variational posterior for bayesian neural networks and its robustness against adversarial examples. Neurocomputing 423:427–434
Article Google Scholar
Minka TP (2001) The EP energy function and minimization schemes. Technical report
Minka TP (2001) Expectation propagation for approximate bayesian inference. In: Conference on uncertainty in artificial intelligence, pp 362–369
Minka TP (2001) A family of algorithms for approximate bayesian inference. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge
Minka TP (2004) Power EP. Technical report. Microsoft Research
Minka TP (2005) Divergence measures and message passing. Technical report. MSR-TR-2005-173
Minka TP, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Conference on uncertainty in artificial intelligence, pp 352–359
Mnih A, Rezende DJ (2016) Variational inference for Monte Carlo objectives. In: International conference on machine learning, pp 2188–2196
Naesseth C, Ruiz F, Linderman S, Blei D (2017) Reparameterization gradients through acceptance-rejection sampling algorithms. In: International conference on artificial intelligence and statistics, pp 489–498
Newman D, Lau JH, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Annual conference of the North American chapter of the association for computational linguistics, pp 100–108
Opper M, Winther O (2005) Expectation consistent approximate inference. J Mach Learn Res 6:2177–2204
MathSciNet MATH Google Scholar
Owen AB (2013) Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/
Paisley J, Blei DM, Jordan MI (2012) Variational Bayesian inference with stochastic search. In: International conference on machine learning, pp 1363–1370
Qi YA, Abdel-Gawad AH, Minka TP (2010) Sparse-posterior Gaussian processes for general likelihoods. In: Conference on uncertainty in artificial intelligence, pp 450–457
Ranganath R, Gerrish S, Blei DM (2014) Black box variational inference. In: International conference on artificial intelligence and statistics, pp 814–822
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, 3rd edn. Wiley series in probability and statistics
Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent Gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society B 71(2):319–392
Article MathSciNet Google Scholar
Ruiz FJR, Titsias MK, Blei DM (2016) Overdispersed black-box variational inference. In: Conference on uncertainty in artificial intelligence, pp 647–656
Salimans T, Knowles DA (2013) Fixed-form variational posterior approximation through stochastic linear regression. Bayesian Anal 8(4):837–882
Article MathSciNet Google Scholar
Seeger M (2005) Expectation propagation for exponential families. Technical report
Smola AJ, Vishwanathan S, Eskin E (2004) Laplace propagation. In: Neural information processing systems, pp 441–448
Sun S, He S (2019) Generalizing expectation propagation with mixtures of exponential family distributions and an application to bayesian logistic regression. Neurocomputing 337:180–190
Article Google Scholar
Teh YW, Hasenclever L, Lienart T, Vollmer S, Webb S (2017) Distributed bayesian learning with stochastic natural-gradient expectation propagation and the posterior server. J Mach Learn Res 18(106):1–37
MathSciNet MATH Google Scholar
Titsias MK, Lazaro-Gredilla M (2014) Doubly stochastic variational bayes for non-conjugate inference. In: International conference on machine learning, pp 1971–1980
Titsias, M.K., Lazaro-Gredilla, M.: Local expectation gradients for black box variational inference. In: Neural Information Processing Systems, pp. 2638–2646 (2015)
Trottini M, Spezzaferri F (1999) A generalized predictive criterion for model selection. Can J Stat 30(1):79–96
Article MathSciNet Google Scholar
Turner R, Sahani M (2011) Probabilistic amplitude and frequency demodulation. In: Neural information processing systems, pp 981–989
Vehtari A, Gelman A, Sivula T, Jylänki P, Tran D, Sahai S, Blomstedt P, Cunningham JP, Schiminovich D, Robert CP (2020) Expectation propagation as a way of life: a framework for bayesian inference on partitioned data. J Mach Learn Res 21:1–53
MathSciNet MATH Google Scholar
Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305
Article Google Scholar
Wang C, Blei DM (2013) Variational inference in nonconjugate models. J Mach Learn Res 14(1):1005–1031
MathSciNet MATH Google Scholar
Wingate D, Weber T (2013) Automated variational inference in probabilistic programming. arXiv:1301.1299
Xu M, Lakshminarayanan B, Teh YW, Zhu J, Zhang B (2014) Distributed Bayesian posterior sampling via moment sharing. In: Neural information processing systems, pp 3356–3364
Zhao J, Liu X, He S, Sun S (2020) Probabilistic inference of bayesian neural networks with generalized expectation propagation. Neurocomputing 412:392–398
Article Google Scholar
Zhu H, Rohwer R (1995) Information geometric measurements of generalisation. Technical report. NCRG/4350. Aston University

Download references

Acknowledgements

This research was supported the National Natural Science Foundation of China (NSFC) [No.61876071, No.62006094].

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, China
Ximing Li, Changchun Li, Jinjin Chi & Jihong Ouyang
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
Ximing Li, Changchun Li, Jinjin Chi & Jihong Ouyang

Authors

Ximing Li
View author publications
You can also search for this author in PubMed Google Scholar
Changchun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinjin Chi
View author publications
You can also search for this author in PubMed Google Scholar
Jihong Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ximing Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Li, C., Chi, J. et al. Approximate posterior inference for Bayesian models: black-box expectation propagation. Knowl Inf Syst 64, 2361–2387 (2022). https://doi.org/10.1007/s10115-022-01705-5

Download citation

Received: 16 August 2020
Revised: 07 May 2022
Accepted: 07 May 2022
Published: 27 July 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10115-022-01705-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate posterior inference for Bayesian models: black-box expectation propagation

Abstract

Access this article

Similar content being viewed by others

Emerging trends in federated learning: from model fusion to federated X learning

A modified Adam algorithm for deep neural network optimization

Distributionally robust stochastic programs with side information based on trimmings

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximate posterior inference for Bayesian models: black-box expectation propagation

Abstract

Access this article

Similar content being viewed by others

Emerging trends in federated learning: from model fusion to federated X learning

A modified Adam algorithm for deep neural network optimization

Distributionally robust stochastic programs with side information based on trimmings

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation