Skip to main content
Log in

Approximate posterior inference for Bayesian models: black-box expectation propagation

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Expectation propagation (EP) is a widely successful way to approximate the posteriors of complex Bayesian models. However, it suffers from expensive memory and time overheads, since it involves local approximations with locally specific messages. A recent art, namely averaged EP (AEP), upgrades EP by leveraging the average message effect on the posterior distribution, instead of the locally specific ones, so as to simultaneously reduce memory and time costs. In this paper, we extend AEP to a novel black-box expectation propagation (abbr. BBEP) algorithm, which can be directly applied to many Bayesian models without model-specific derivations. We leverage three ideas of black-box learning, leading to three versions of BBEP, referred to as BBEP\(^{{\varvec{m}}}\), BBEP\(^{{\varvec{g}}}\) and BBEP\(^{{\varvec{o}}}\) with Monte Carlo moment matching, Monte Carlo gradients and objective of AEP, respectively. For variance reduction, the importance sampling is used, and the proposal distribution selection as well as high dimensionality setting is discussed. Furthermore, we develop online versions of BBEP for optimization speedup given large-scale data sets. We empirically compare BBEP against the state-of-the-art black-box baseline algorithms on both synthetic and real-world data sets. Experimental results demonstrate that BBEP outperforms the baseline algorithms and it is even on a par with analytical solutions in some settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In Sect. 2, we will describe EP as well as the cavity distribution in more details.

  2. Actually, the inference algorithms described here can be also applied to more generic Bayesian models. The model family shown in Fig. 1 is only a running example.

  3. In this paper, we assume that the encoding distribution \(p({\varvec{\varepsilon }}|{\varvec{\ddot{\lambda }}})\) is in the same exponential family of the prior distribution \(p_0({\varvec{\theta |\lambda _0}})\), i.e., \(p({\varvec{\varepsilon }}|{\varvec{\ddot{\lambda }}}) \propto \exp (s({\varvec{\varepsilon }})^T {\varvec{\ddot{\lambda }}})\).

  4. http://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

References

  1. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  2. Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35

    Article  MathSciNet  Google Scholar 

  3. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  4. Burda Y, Grosse R, Salakhutdinov R (2016) Importance weighted autoencoders. In: International conference on learning representations

  5. Cunningham JP, Hennig P, Lacoste-Julien S (2011) Gaussian probabilities and expectation propagation. arXiv:1111.6832

  6. Dehaene G, Barthelmé S (2018) Expectation propagation in the large-data limit. J Roy Stat Soc Ser B Stat Methodol 80(1):199–217

    Article  MathSciNet  Google Scholar 

  7. Duchi J, Hazan E, Singer Y (2003) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 11:2121–2159

    MathSciNet  MATH  Google Scholar 

  8. Eslami SMA, Tarlow D, Kohli P, Winn J (2014) Just-in-time learning for fast and flexible inference. In: Neural information processing systems, pp 154–162

  9. Giordano R, Broderick T, Jordan M (2015) Linear response methods for accurate covariance estimates from mean field variational Bayes. In: Neural information processing systems, pp 1441–1449

  10. Gu S, Levine S, Sutskever I, Mnih A (2016) Muprop: unbiased backpropagation for stochastic neural networks. In: International conference on learning representations

  11. Heess N, Tarlow D, Winn J (2013) Learning to pass expectation propagation messages. In: Neural information processing systems, pp 3219–3227

  12. Hernandez-Lobato JM, Li Y, Rowland M, Hernandez-Lobato D, Bui TD, Turner RE (2016) Black-box \(\alpha \)-divergence minimization. In: International conference on machine learning pp 1511–1520

  13. Heskes T, Zoeter O (2002) Expectation propagation for approximate inference in dynamic bayesian networks. In: Conference on uncertainty in artificial intelligence, pp 216–223

  14. Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347

    MathSciNet  MATH  Google Scholar 

  15. Jitkrittum W, Gretton A, Heess N, Balaji L, Sejdinovic D, Szabo Z (2015) Kernel-based just-in-time learning for passing expectation propagation messages. In: International conference on uncertainty in artificial intelligence, pp 405–414

  16. Jordan MI, Chahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233

    Article  Google Scholar 

  17. Kingma D, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations

  18. Li, X., Li, C., Chi, J., Ouyang, J.: Variance reduction in black-box variational inference by adaptive importance sampling. In: International Joint Conference on Artificial Intelligence, pp. 2404–2410 (2018)

  19. Li X, Li C, Chi J, Ouyang J, Wang W (2018) Black-box expectation propagation for bayesian models. In: SIAM international conference on data mining, pp 603–611

  20. Li Y, Hernandez-LobatoAndezo JM, Turner RE (2015) Stochastic expectation propagation. In: Neural information processing systems, pp 2323–2331

  21. Lienart T, Teh YW, Doucet A (2015) Expectation particle belief propagation. In: Neural information processing systems, pp 3609–3617

  22. Liu X, He S (2021) Alpha-divergence minimization with mixed variational posterior for bayesian neural networks and its robustness against adversarial examples. Neurocomputing 423:427–434

    Article  Google Scholar 

  23. Minka TP (2001) The EP energy function and minimization schemes. Technical report

  24. Minka TP (2001) Expectation propagation for approximate bayesian inference. In: Conference on uncertainty in artificial intelligence, pp 362–369

  25. Minka TP (2001) A family of algorithms for approximate bayesian inference. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge

  26. Minka TP (2004) Power EP. Technical report. Microsoft Research

  27. Minka TP (2005) Divergence measures and message passing. Technical report. MSR-TR-2005-173

  28. Minka TP, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Conference on uncertainty in artificial intelligence, pp 352–359

  29. Mnih A, Rezende DJ (2016) Variational inference for Monte Carlo objectives. In: International conference on machine learning, pp 2188–2196

  30. Naesseth C, Ruiz F, Linderman S, Blei D (2017) Reparameterization gradients through acceptance-rejection sampling algorithms. In: International conference on artificial intelligence and statistics, pp 489–498

  31. Newman D, Lau JH, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Annual conference of the North American chapter of the association for computational linguistics, pp 100–108

  32. Opper M, Winther O (2005) Expectation consistent approximate inference. J Mach Learn Res 6:2177–2204

    MathSciNet  MATH  Google Scholar 

  33. Owen AB (2013) Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/

  34. Paisley J, Blei DM, Jordan MI (2012) Variational Bayesian inference with stochastic search. In: International conference on machine learning, pp 1363–1370

  35. Qi YA, Abdel-Gawad AH, Minka TP (2010) Sparse-posterior Gaussian processes for general likelihoods. In: Conference on uncertainty in artificial intelligence, pp 450–457

  36. Ranganath R, Gerrish S, Blei DM (2014) Black box variational inference. In: International conference on artificial intelligence and statistics, pp 814–822

  37. Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, 3rd edn. Wiley series in probability and statistics

  38. Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent Gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society B 71(2):319–392

    Article  MathSciNet  Google Scholar 

  39. Ruiz FJR, Titsias MK, Blei DM (2016) Overdispersed black-box variational inference. In: Conference on uncertainty in artificial intelligence, pp 647–656

  40. Salimans T, Knowles DA (2013) Fixed-form variational posterior approximation through stochastic linear regression. Bayesian Anal 8(4):837–882

    Article  MathSciNet  Google Scholar 

  41. Seeger M (2005) Expectation propagation for exponential families. Technical report

  42. Smola AJ, Vishwanathan S, Eskin E (2004) Laplace propagation. In: Neural information processing systems, pp 441–448

  43. Sun S, He S (2019) Generalizing expectation propagation with mixtures of exponential family distributions and an application to bayesian logistic regression. Neurocomputing 337:180–190

    Article  Google Scholar 

  44. Teh YW, Hasenclever L, Lienart T, Vollmer S, Webb S (2017) Distributed bayesian learning with stochastic natural-gradient expectation propagation and the posterior server. J Mach Learn Res 18(106):1–37

    MathSciNet  MATH  Google Scholar 

  45. Titsias MK, Lazaro-Gredilla M (2014) Doubly stochastic variational bayes for non-conjugate inference. In: International conference on machine learning, pp 1971–1980

  46. Titsias, M.K., Lazaro-Gredilla, M.: Local expectation gradients for black box variational inference. In: Neural Information Processing Systems, pp. 2638–2646 (2015)

  47. Trottini M, Spezzaferri F (1999) A generalized predictive criterion for model selection. Can J Stat 30(1):79–96

    Article  MathSciNet  Google Scholar 

  48. Turner R, Sahani M (2011) Probabilistic amplitude and frequency demodulation. In: Neural information processing systems, pp 981–989

  49. Vehtari A, Gelman A, Sivula T, Jylänki P, Tran D, Sahai S, Blomstedt P, Cunningham JP, Schiminovich D, Robert CP (2020) Expectation propagation as a way of life: a framework for bayesian inference on partitioned data. J Mach Learn Res 21:1–53

    MathSciNet  MATH  Google Scholar 

  50. Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305

    Article  Google Scholar 

  51. Wang C, Blei DM (2013) Variational inference in nonconjugate models. J Mach Learn Res 14(1):1005–1031

    MathSciNet  MATH  Google Scholar 

  52. Wingate D, Weber T (2013) Automated variational inference in probabilistic programming. arXiv:1301.1299

  53. Xu M, Lakshminarayanan B, Teh YW, Zhu J, Zhang B (2014) Distributed Bayesian posterior sampling via moment sharing. In: Neural information processing systems, pp 3356–3364

  54. Zhao J, Liu X, He S, Sun S (2020) Probabilistic inference of bayesian neural networks with generalized expectation propagation. Neurocomputing 412:392–398

    Article  Google Scholar 

  55. Zhu H, Rohwer R (1995) Information geometric measurements of generalisation. Technical report. NCRG/4350. Aston University

Download references

Acknowledgements

This research was supported the National Natural Science Foundation of China (NSFC) [No.61876071, No.62006094].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ximing Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Li, C., Chi, J. et al. Approximate posterior inference for Bayesian models: black-box expectation propagation. Knowl Inf Syst 64, 2361–2387 (2022). https://doi.org/10.1007/s10115-022-01705-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01705-5

Keywords

Navigation