Abstract
Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bishop, C.: Pattern Recognition and Machince Learning. Springer, Heidelberg (2006)
Barber, D., Bishop, C.: Ensemble learning for multi-layer networks. In: Advances in Neural Information Processing Systems 10, pp. 395–401. The MIT Press, Cambridge (1998)
Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Advances in Neural Information Processing Systems 12, pp. 603–609. MIT Press, Cambridge (2000)
Lappalainen, H., Honkela, A.: Bayesian nonlinear independent component analysis by multi-layer perceptrons. In: Girolami, M. (ed.) Advances in Independent Component Analysis, pp. 93–121. Springer, Berlin (2000)
Valpola, H., Karhunen, J.: An unsupervised ensemble learning method for nonlinear dynamic state-space models. Neural Computation 14(11), 2647–2692 (2002)
Valpola, H., Harva, M., Karhunen, J.: Hierarchical models of variance sources. Signal Processing 84(2), 267–282 (2004)
Honkela, A., Valpola, H.: Unsupervised variational Bayesian learning of nonlinear models. In: Advances in Neural Information Processing Systems 17, pp. 593–600. MIT Press, Cambridge (2005)
Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, Heidelberg (1985)
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Sato, M.: Online model selection based on the variational Bayes. Neural Computation 13(7), 1649–1681 (2001)
Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Chapman and Hall, Boca Raton (1993)
Valpola, H.: Bayesian Ensemble Learning for Nonlinear Factor Analysis. PhD thesis, Helsinki University of Technology, Espoo, Finland, Published in Acta Polytechnica Scandinavica, Mathematics and Computing Series No. 108 (2000)
Nocedal, J.: Theory of algorithms for unconstrained optimization. Acta Numerica 1, 199–242 (1991)
Smith, S.T.: Geometric Optimization Methods for Adaptive Filtering. PhD thesis, Harvard University, Cambridge, Massachusetts (1993)
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications 20(2), 303–353 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Honkela, A., Tornio, M., Raiko, T., Karhunen, J. (2008). Natural Conjugate Gradient in Variational Inference. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-69162-4_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69159-4
Online ISBN: 978-3-540-69162-4
eBook Packages: Computer ScienceComputer Science (R0)