Second-Order Optimization over the Multivariate Gaussian Distribution

Malagò, Luigi; Pistone, Giovanni

doi:10.1007/978-3-319-25040-3_38

Luigi Malagò¹⁵ &
Giovanni Pistone¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9389))

Included in the following conference series:

International Conference on Geometric Science of Information

2119 Accesses

Abstract

We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the second-order geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize second-order optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of second-order methods, such as the Newton method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6, 721–741 (1984)
Article MATH Google Scholar
Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1, 127–190 (1999)
Article MathSciNet MATH Google Scholar
Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algoritms: A New Tool for Evolutionary Computation. Springer, New York (2001)
Google Scholar
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9, 159–195 (2001)
Article Google Scholar
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. JMLR 15, 949–980 (2014)
MathSciNet MATH Google Scholar
Malagò, L., Matteucci, M., Pistone, G.: Towards the geometry of estimation of distribution algorithms based on the exponential family. In: Proceedings of FOGA 2011, pp. 230–242, ACM (2011)
Google Scholar
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. arXiv:1106.3708 (2011)
Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 796–817 (2001)
Article MathSciNet MATH Google Scholar
Malagò, L., Matteucci, M., Pistone, G.: Stochastic relaxation as a unifying approach in 0/1 programming. In: NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML) (2009)
Google Scholar
Rao, R.C.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)
MathSciNet MATH Google Scholar
Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)
Article Google Scholar
Amari, S.: Neural learning in structured parameter spaces - natural Riemannian gradient. In: NIPS 1997, pp. 127–133. MIT Press (1997)
Google Scholar
Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. In: Proceedings of ICLR 2014 (2014)
Google Scholar
Kakade, S.: A natural policy gradient. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS 2001, pp. 1531–1538. MIT Press (2001)
Google Scholar
Kuusela, M., Raiko, T., Honkela, A., Karhunen, J.: A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians. In: Neural Networks (IJCNN 2009), pp. 1688–1695 (2009)
Google Scholar
Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, New York (1985)
MATH Google Scholar
Amari, S.-I., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical manifolds. Differential geometry in statistical inference. Institute of Mathematical Statistics, Hayward, CA (1987)
MATH Google Scholar
Amari, S., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2000)
MATH Google Scholar
Pistone, G.: Algebraic varieties vs differentiable manifolds in statistical models. In: Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P. (eds.) Algebraic and Geometric Methods in Statistics, pp. 339–363. Cambridge University Press, Cambridge (2009)
Google Scholar
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book MATH Google Scholar
Malagò, L., Pistone, G.: Stochastic relaxation over the exponential family: second-order geometry. In: NIPS 2014 Workshop on Optimization for Machine Learning (OPT 2014), Montreal, Canada, 12 December 2014 (2014)
Google Scholar
Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes - Monograph Series, vol. 9. Institute of Mathematical Statistics, Hayward (1986)
MATH Google Scholar
Manton, J.H.: A framework for generalising the Newton method and other iterative methods from euclidean space to manifolds. arXiv:1106.3708 (2012v1; 2014v2)
Malagò, L., Pistone, G.: Combinatorial optimization with information geometry: the Newton method. Entropy 16, 4260–4289 (2014)
Article MathSciNet MATH Google Scholar
do Carmo, M.P.: Riemannian Geometry, Mathematics: Theory & Applications. Birkhäuser Boston Inc., Boston (1992)
Book Google Scholar
Malagò, L., Pistone, G.: Information geometry of the Gaussian distribution in view of stochastic optimization. In: Proceedings of FOGA 2015, pp. 150–162 (2015)
Google Scholar
Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scand. J. Stat. 11, 211–223 (1984)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Giovanni Pistone is supported by de Castro Statistics, Collegio Carlo Alberto, Moncalieri, and he is a member of GNAMPA-INDAM.

Author information

Authors and Affiliations

Shinshu University & Inria Saclay – Île-de-France, 4-17-1 Wakasato, Nagano, 380-8553, Japan
Luigi Malagò
Collegio Carlo Alberto, Via Real Collegio, 30, 10024, Moncalieri, Italy
Giovanni Pistone

Authors

Luigi Malagò
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Pistone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Malagò .

Editor information

Editors and Affiliations

Bâtiment Alan Turing, CS35003, École Polytechnique, Palaiseau, France
Frank Nielsen
Thales Land\& Air Systems, Limours, France
Frédéric Barbaresco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malagò, L., Pistone, G. (2015). Second-Order Optimization over the Multivariate Gaussian Distribution. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-25040-3_38
Published: 03 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25039-7
Online ISBN: 978-3-319-25040-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics