Abstract
Expectation propagation (EP) is a widely successful way to approximate the posteriors of complex Bayesian models. However, it suffers from expensive memory and time overheads, since it involves local approximations with locally specific messages. A recent art, namely averaged EP (AEP), upgrades EP by leveraging the average message effect on the posterior distribution, instead of the locally specific ones, so as to simultaneously reduce memory and time costs. In this paper, we extend AEP to a novel black-box expectation propagation (abbr. BBEP) algorithm, which can be directly applied to many Bayesian models without model-specific derivations. We leverage three ideas of black-box learning, leading to three versions of BBEP, referred to as BBEP\(^{{\varvec{m}}}\), BBEP\(^{{\varvec{g}}}\) and BBEP\(^{{\varvec{o}}}\) with Monte Carlo moment matching, Monte Carlo gradients and objective of AEP, respectively. For variance reduction, the importance sampling is used, and the proposal distribution selection as well as high dimensionality setting is discussed. Furthermore, we develop online versions of BBEP for optimization speedup given large-scale data sets. We empirically compare BBEP against the state-of-the-art black-box baseline algorithms on both synthetic and real-world data sets. Experimental results demonstrate that BBEP outperforms the baseline algorithms and it is even on a par with analytical solutions in some settings.
Similar content being viewed by others
Notes
In Sect. 2, we will describe EP as well as the cavity distribution in more details.
Actually, the inference algorithms described here can be also applied to more generic Bayesian models. The model family shown in Fig. 1 is only a running example.
In this paper, we assume that the encoding distribution \(p({\varvec{\varepsilon }}|{\varvec{\ddot{\lambda }}})\) is in the same exponential family of the prior distribution \(p_0({\varvec{\theta |\lambda _0}})\), i.e., \(p({\varvec{\varepsilon }}|{\varvec{\ddot{\lambda }}}) \propto \exp (s({\varvec{\varepsilon }})^T {\varvec{\ddot{\lambda }}})\).
References
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Burda Y, Grosse R, Salakhutdinov R (2016) Importance weighted autoencoders. In: International conference on learning representations
Cunningham JP, Hennig P, Lacoste-Julien S (2011) Gaussian probabilities and expectation propagation. arXiv:1111.6832
Dehaene G, Barthelmé S (2018) Expectation propagation in the large-data limit. J Roy Stat Soc Ser B Stat Methodol 80(1):199–217
Duchi J, Hazan E, Singer Y (2003) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 11:2121–2159
Eslami SMA, Tarlow D, Kohli P, Winn J (2014) Just-in-time learning for fast and flexible inference. In: Neural information processing systems, pp 154–162
Giordano R, Broderick T, Jordan M (2015) Linear response methods for accurate covariance estimates from mean field variational Bayes. In: Neural information processing systems, pp 1441–1449
Gu S, Levine S, Sutskever I, Mnih A (2016) Muprop: unbiased backpropagation for stochastic neural networks. In: International conference on learning representations
Heess N, Tarlow D, Winn J (2013) Learning to pass expectation propagation messages. In: Neural information processing systems, pp 3219–3227
Hernandez-Lobato JM, Li Y, Rowland M, Hernandez-Lobato D, Bui TD, Turner RE (2016) Black-box \(\alpha \)-divergence minimization. In: International conference on machine learning pp 1511–1520
Heskes T, Zoeter O (2002) Expectation propagation for approximate inference in dynamic bayesian networks. In: Conference on uncertainty in artificial intelligence, pp 216–223
Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347
Jitkrittum W, Gretton A, Heess N, Balaji L, Sejdinovic D, Szabo Z (2015) Kernel-based just-in-time learning for passing expectation propagation messages. In: International conference on uncertainty in artificial intelligence, pp 405–414
Jordan MI, Chahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Kingma D, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations
Li, X., Li, C., Chi, J., Ouyang, J.: Variance reduction in black-box variational inference by adaptive importance sampling. In: International Joint Conference on Artificial Intelligence, pp. 2404–2410 (2018)
Li X, Li C, Chi J, Ouyang J, Wang W (2018) Black-box expectation propagation for bayesian models. In: SIAM international conference on data mining, pp 603–611
Li Y, Hernandez-LobatoAndezo JM, Turner RE (2015) Stochastic expectation propagation. In: Neural information processing systems, pp 2323–2331
Lienart T, Teh YW, Doucet A (2015) Expectation particle belief propagation. In: Neural information processing systems, pp 3609–3617
Liu X, He S (2021) Alpha-divergence minimization with mixed variational posterior for bayesian neural networks and its robustness against adversarial examples. Neurocomputing 423:427–434
Minka TP (2001) The EP energy function and minimization schemes. Technical report
Minka TP (2001) Expectation propagation for approximate bayesian inference. In: Conference on uncertainty in artificial intelligence, pp 362–369
Minka TP (2001) A family of algorithms for approximate bayesian inference. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge
Minka TP (2004) Power EP. Technical report. Microsoft Research
Minka TP (2005) Divergence measures and message passing. Technical report. MSR-TR-2005-173
Minka TP, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Conference on uncertainty in artificial intelligence, pp 352–359
Mnih A, Rezende DJ (2016) Variational inference for Monte Carlo objectives. In: International conference on machine learning, pp 2188–2196
Naesseth C, Ruiz F, Linderman S, Blei D (2017) Reparameterization gradients through acceptance-rejection sampling algorithms. In: International conference on artificial intelligence and statistics, pp 489–498
Newman D, Lau JH, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Annual conference of the North American chapter of the association for computational linguistics, pp 100–108
Opper M, Winther O (2005) Expectation consistent approximate inference. J Mach Learn Res 6:2177–2204
Owen AB (2013) Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/
Paisley J, Blei DM, Jordan MI (2012) Variational Bayesian inference with stochastic search. In: International conference on machine learning, pp 1363–1370
Qi YA, Abdel-Gawad AH, Minka TP (2010) Sparse-posterior Gaussian processes for general likelihoods. In: Conference on uncertainty in artificial intelligence, pp 450–457
Ranganath R, Gerrish S, Blei DM (2014) Black box variational inference. In: International conference on artificial intelligence and statistics, pp 814–822
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, 3rd edn. Wiley series in probability and statistics
Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent Gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society B 71(2):319–392
Ruiz FJR, Titsias MK, Blei DM (2016) Overdispersed black-box variational inference. In: Conference on uncertainty in artificial intelligence, pp 647–656
Salimans T, Knowles DA (2013) Fixed-form variational posterior approximation through stochastic linear regression. Bayesian Anal 8(4):837–882
Seeger M (2005) Expectation propagation for exponential families. Technical report
Smola AJ, Vishwanathan S, Eskin E (2004) Laplace propagation. In: Neural information processing systems, pp 441–448
Sun S, He S (2019) Generalizing expectation propagation with mixtures of exponential family distributions and an application to bayesian logistic regression. Neurocomputing 337:180–190
Teh YW, Hasenclever L, Lienart T, Vollmer S, Webb S (2017) Distributed bayesian learning with stochastic natural-gradient expectation propagation and the posterior server. J Mach Learn Res 18(106):1–37
Titsias MK, Lazaro-Gredilla M (2014) Doubly stochastic variational bayes for non-conjugate inference. In: International conference on machine learning, pp 1971–1980
Titsias, M.K., Lazaro-Gredilla, M.: Local expectation gradients for black box variational inference. In: Neural Information Processing Systems, pp. 2638–2646 (2015)
Trottini M, Spezzaferri F (1999) A generalized predictive criterion for model selection. Can J Stat 30(1):79–96
Turner R, Sahani M (2011) Probabilistic amplitude and frequency demodulation. In: Neural information processing systems, pp 981–989
Vehtari A, Gelman A, Sivula T, Jylänki P, Tran D, Sahai S, Blomstedt P, Cunningham JP, Schiminovich D, Robert CP (2020) Expectation propagation as a way of life: a framework for bayesian inference on partitioned data. J Mach Learn Res 21:1–53
Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305
Wang C, Blei DM (2013) Variational inference in nonconjugate models. J Mach Learn Res 14(1):1005–1031
Wingate D, Weber T (2013) Automated variational inference in probabilistic programming. arXiv:1301.1299
Xu M, Lakshminarayanan B, Teh YW, Zhu J, Zhang B (2014) Distributed Bayesian posterior sampling via moment sharing. In: Neural information processing systems, pp 3356–3364
Zhao J, Liu X, He S, Sun S (2020) Probabilistic inference of bayesian neural networks with generalized expectation propagation. Neurocomputing 412:392–398
Zhu H, Rohwer R (1995) Information geometric measurements of generalisation. Technical report. NCRG/4350. Aston University
Acknowledgements
This research was supported the National Natural Science Foundation of China (NSFC) [No.61876071, No.62006094].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, X., Li, C., Chi, J. et al. Approximate posterior inference for Bayesian models: black-box expectation propagation. Knowl Inf Syst 64, 2361–2387 (2022). https://doi.org/10.1007/s10115-022-01705-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01705-5