Skip to main content

Natural Conjugate Gradient in Variational Inference

  • Conference paper
Neural Information Processing (ICONIP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4985))

Included in the following conference series:

  • 1740 Accesses

Abstract

Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bishop, C.: Pattern Recognition and Machince Learning. Springer, Heidelberg (2006)

    Google Scholar 

  2. Barber, D., Bishop, C.: Ensemble learning for multi-layer networks. In: Advances in Neural Information Processing Systems 10, pp. 395–401. The MIT Press, Cambridge (1998)

    Google Scholar 

  3. Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Advances in Neural Information Processing Systems 12, pp. 603–609. MIT Press, Cambridge (2000)

    Google Scholar 

  4. Lappalainen, H., Honkela, A.: Bayesian nonlinear independent component analysis by multi-layer perceptrons. In: Girolami, M. (ed.) Advances in Independent Component Analysis, pp. 93–121. Springer, Berlin (2000)

    Google Scholar 

  5. Valpola, H., Karhunen, J.: An unsupervised ensemble learning method for nonlinear dynamic state-space models. Neural Computation 14(11), 2647–2692 (2002)

    Article  MATH  Google Scholar 

  6. Valpola, H., Harva, M., Karhunen, J.: Hierarchical models of variance sources. Signal Processing 84(2), 267–282 (2004)

    Article  MATH  Google Scholar 

  7. Honkela, A., Valpola, H.: Unsupervised variational Bayesian learning of nonlinear models. In: Advances in Neural Information Processing Systems 17, pp. 593–600. MIT Press, Cambridge (2005)

    Google Scholar 

  8. Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, Heidelberg (1985)

    MATH  Google Scholar 

  9. Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)

    Article  MathSciNet  Google Scholar 

  10. Sato, M.: Online model selection based on the variational Bayes. Neural Computation 13(7), 1649–1681 (2001)

    Article  MATH  Google Scholar 

  11. Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Chapman and Hall, Boca Raton (1993)

    MATH  Google Scholar 

  12. Valpola, H.: Bayesian Ensemble Learning for Nonlinear Factor Analysis. PhD thesis, Helsinki University of Technology, Espoo, Finland, Published in Acta Polytechnica Scandinavica, Mathematics and Computing Series No. 108 (2000)

    Google Scholar 

  13. Nocedal, J.: Theory of algorithms for unconstrained optimization. Acta Numerica 1, 199–242 (1991)

    Article  Google Scholar 

  14. Smith, S.T.: Geometric Optimization Methods for Adaptive Filtering. PhD thesis, Harvard University, Cambridge, Massachusetts (1993)

    Google Scholar 

  15. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications 20(2), 303–353 (1998)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Honkela, A., Tornio, M., Raiko, T., Karhunen, J. (2008). Natural Conjugate Gradient in Variational Inference. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69162-4_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69159-4

  • Online ISBN: 978-3-540-69162-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics