Abstract
The Covariance matrix adaptation evolution strategy (CMA-ES) evolves a multivariate Gaussian distribution for continuous optimization. The evolution path, which accumulates historical search directions in successive generations, plays a crucial role in the adaptation of covariance matrix. In this paper, we investigate what the evolution path learns in the optimization procedure. We show that the evolution path accumulates natural gradient with respect to the distribution mean, and acts as a momentum under stationary condition. The experimental results suggest that the evolution path learns relative scales of the eigenvectors, expanded by singular values along corresponding eigenvectors of the inverse Hessian. Further, we show that the outer product of evolution path serves as a rank-1 momentum term for the covariance matrix.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)
Hansen, N.: The CMA evolution strategy: a comparing review. In: Lozano, J.A., Larrañaga, P., Inza, I., Bengoetxea, E. (eds.) Towards a New Evolutionary Computation, pp. 75–102. Springer, Heidelberg (2006)
Hansen, N., Auger, A.: Principled design of continuous stochastic search: from theory to practice (2013)
Suttorp, T., Hansen, N., Igel, C.: Efficient covariance matrix update for variable metric evolution strategies. Mach. Learn. 75, 167–197 (2009)
Stich, S.U.: On low complexity acceleration techniques for randomized optimization. In: Bartz-Beielstein, T., Branke, J., Filipič, B., Smith, J. (eds.) PPSN 2014. LNCS, vol. 8672, pp. 130–140. Springer, Heidelberg (2014)
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picturevia invariance principles. arXiv (2013)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press Inc., New York (1995)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. In: Neurocomputing: Foundations of Research, pp. 696–699. MIT Press, Cambridge (1988)
Moreira, M., Fiesler, E.: Neural networks with adaptive learning rate and momentum terms. Technical report, Idiap-RR-04-1995. IDIAP, Martigny, Switzerland, October 1995
Bhaya, A., Kaszkurewicz, E.: Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Netw. 17(1), 65–71 (2004)
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Theoretical foundation for CMA-ES from information geometry perspective. Algorithmica 64, 698–716 (2012)
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, Z., Zhang, Q. (2016). What Does the Evolution Path Learn in CMA-ES?. In: Handl, J., Hart, E., Lewis, P., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds) Parallel Problem Solving from Nature – PPSN XIV. PPSN 2016. Lecture Notes in Computer Science(), vol 9921. Springer, Cham. https://doi.org/10.1007/978-3-319-45823-6_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-45823-6_70
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45822-9
Online ISBN: 978-3-319-45823-6
eBook Packages: Computer ScienceComputer Science (R0)