Natural Conjugate Gradient in Variational Inference

Honkela, Antti; Tornio, Matti; Raiko, Tapani; Karhunen, Juha

doi:10.1007/978-3-540-69162-4_32

Antti Honkela¹,
Matti Tornio¹,
Tapani Raiko¹ &
…
Juha Karhunen¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4985))

Included in the following conference series:

International Conference on Neural Information Processing

1740 Accesses

Abstract

Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Gaussian variational approximation with sparse precision matrices

Article 10 February 2017

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

Article 27 September 2023

Tuning the Learning Rate for Stochastic Variational Inference

Article 07 March 2016

References

Bishop, C.: Pattern Recognition and Machince Learning. Springer, Heidelberg (2006)
Google Scholar
Barber, D., Bishop, C.: Ensemble learning for multi-layer networks. In: Advances in Neural Information Processing Systems 10, pp. 395–401. The MIT Press, Cambridge (1998)
Google Scholar
Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Advances in Neural Information Processing Systems 12, pp. 603–609. MIT Press, Cambridge (2000)
Google Scholar
Lappalainen, H., Honkela, A.: Bayesian nonlinear independent component analysis by multi-layer perceptrons. In: Girolami, M. (ed.) Advances in Independent Component Analysis, pp. 93–121. Springer, Berlin (2000)
Google Scholar
Valpola, H., Karhunen, J.: An unsupervised ensemble learning method for nonlinear dynamic state-space models. Neural Computation 14(11), 2647–2692 (2002)
Article MATH Google Scholar
Valpola, H., Harva, M., Karhunen, J.: Hierarchical models of variance sources. Signal Processing 84(2), 267–282 (2004)
Article MATH Google Scholar
Honkela, A., Valpola, H.: Unsupervised variational Bayesian learning of nonlinear models. In: Advances in Neural Information Processing Systems 17, pp. 593–600. MIT Press, Cambridge (2005)
Google Scholar
Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, Heidelberg (1985)
MATH Google Scholar
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Article MathSciNet Google Scholar
Sato, M.: Online model selection based on the variational Bayes. Neural Computation 13(7), 1649–1681 (2001)
Article MATH Google Scholar
Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Chapman and Hall, Boca Raton (1993)
MATH Google Scholar
Valpola, H.: Bayesian Ensemble Learning for Nonlinear Factor Analysis. PhD thesis, Helsinki University of Technology, Espoo, Finland, Published in Acta Polytechnica Scandinavica, Mathematics and Computing Series No. 108 (2000)
Google Scholar
Nocedal, J.: Theory of algorithms for unconstrained optimization. Acta Numerica 1, 199–242 (1991)
Article Google Scholar
Smith, S.T.: Geometric Optimization Methods for Adaptive Filtering. PhD thesis, Harvard University, Cambridge, Massachusetts (1993)
Google Scholar
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications 20(2), 303–353 (1998)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Adaptive Informatics Research Centre, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland
Antti Honkela, Matti Tornio, Tapani Raiko & Juha Karhunen

Authors

Antti Honkela
View author publications
You can also search for this author in PubMed Google Scholar
Matti Tornio
View author publications
You can also search for this author in PubMed Google Scholar
Tapani Raiko
View author publications
You can also search for this author in PubMed Google Scholar
Juha Karhunen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Honkela, A., Tornio, M., Raiko, T., Karhunen, J. (2008). Natural Conjugate Gradient in Variational Inference. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-69162-4_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69159-4
Online ISBN: 978-3-540-69162-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics