Skip to main content

Second-Order Optimization over the Multivariate Gaussian Distribution

  • Conference paper
  • First Online:
Book cover Geometric Science of Information (GSI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9389))

Included in the following conference series:

  • 2119 Accesses

Abstract

We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the second-order geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize second-order optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of second-order methods, such as the Newton method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  2. Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1, 127–190 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algoritms: A New Tool for Evolutionary Computation. Springer, New York (2001)

    Google Scholar 

  4. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9, 159–195 (2001)

    Article  Google Scholar 

  5. Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. JMLR 15, 949–980 (2014)

    MathSciNet  MATH  Google Scholar 

  6. Malagò, L., Matteucci, M., Pistone, G.: Towards the geometry of estimation of distribution algorithms based on the exponential family. In: Proceedings of FOGA 2011, pp. 230–242, ACM (2011)

    Google Scholar 

  7. Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. arXiv:1106.3708 (2011)

  8. Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 796–817 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  9. Malagò, L., Matteucci, M., Pistone, G.: Stochastic relaxation as a unifying approach in 0/1 programming. In: NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML) (2009)

    Google Scholar 

  10. Rao, R.C.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)

    MathSciNet  MATH  Google Scholar 

  11. Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)

    Article  Google Scholar 

  12. Amari, S.: Neural learning in structured parameter spaces - natural Riemannian gradient. In: NIPS 1997, pp. 127–133. MIT Press (1997)

    Google Scholar 

  13. Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. In: Proceedings of ICLR 2014 (2014)

    Google Scholar 

  14. Kakade, S.: A natural policy gradient. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS 2001, pp. 1531–1538. MIT Press (2001)

    Google Scholar 

  15. Kuusela, M., Raiko, T., Honkela, A., Karhunen, J.: A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians. In: Neural Networks (IJCNN 2009), pp. 1688–1695 (2009)

    Google Scholar 

  16. Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, New York (1985)

    MATH  Google Scholar 

  17. Amari, S.-I., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical manifolds. Differential geometry in statistical inference. Institute of Mathematical Statistics, Hayward, CA (1987)

    MATH  Google Scholar 

  18. Amari, S., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2000)

    MATH  Google Scholar 

  19. Pistone, G.: Algebraic varieties vs differentiable manifolds in statistical models. In: Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P. (eds.) Algebraic and Geometric Methods in Statistics, pp. 339–363. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  20. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)

    Book  MATH  Google Scholar 

  21. Malagò, L., Pistone, G.: Stochastic relaxation over the exponential family: second-order geometry. In: NIPS 2014 Workshop on Optimization for Machine Learning (OPT 2014), Montreal, Canada, 12 December 2014 (2014)

    Google Scholar 

  22. Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes - Monograph Series, vol. 9. Institute of Mathematical Statistics, Hayward (1986)

    MATH  Google Scholar 

  23. Manton, J.H.: A framework for generalising the Newton method and other iterative methods from euclidean space to manifolds. arXiv:1106.3708 (2012v1; 2014v2)

  24. Malagò, L., Pistone, G.: Combinatorial optimization with information geometry: the Newton method. Entropy 16, 4260–4289 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  25. do Carmo, M.P.: Riemannian Geometry, Mathematics: Theory & Applications. Birkhäuser Boston Inc., Boston (1992)

    Book  Google Scholar 

  26. Malagò, L., Pistone, G.: Information geometry of the Gaussian distribution in view of stochastic optimization. In: Proceedings of FOGA 2015, pp. 150–162 (2015)

    Google Scholar 

  27. Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scand. J. Stat. 11, 211–223 (1984)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Giovanni Pistone is supported by de Castro Statistics, Collegio Carlo Alberto, Moncalieri, and he is a member of GNAMPA-INDAM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luigi Malagò .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Malagò, L., Pistone, G. (2015). Second-Order Optimization over the Multivariate Gaussian Distribution. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25040-3_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25039-7

  • Online ISBN: 978-3-319-25040-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics