Skip to main content

Deep Neuroevolution: Training Neural Networks Using a Matrix-Free Evolution Strategy

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Abstract

In this paper, we discuss an evolutionary method for training deep neural networks. The proposed solution is based on the Differential Evolution Strategy (DES) – an algorithm that is a crossover between Differential Evolution (DE) and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We combine this approach with Xavier’s coefficient-based population initialization, batch processing, and gradient-based mutations—the resulting weight optimizer is called neural Differential Evolution Strategy (nDES). Our algorithm yields results comparable to Adaptive Moment Estimation ADAM for a convolutional network training task (50K parameters) on the FashionMNIST dataset. We show that combining both methods results in better models than those obtained after training by either of these algorithms alone. Furthermore, nDES significantly outperforms ADAM on three classic toy recurrent neural network problems. The proposed solution is scalable in an embarrassingly parallel way. For reproducibility purposes, we provide a reference implementation written in Python.

All authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arabas, J., Jagodziński, D.: Toward a matrix-free covariance matrix adaptation evolution strategy. IEEE Trans. Evol. Comput. 24(1), 84–98 (2020)

    Article  Google Scholar 

  2. Awad, N.H., Ali, M.Z., Liang, J.J., Qu, B.Y., Suganthan, P.N.: Problem definitions and evaluation criteria for the CEC 2017 special session and competition on single objective bound constrained real-parameter numerical optimization. Technical report, Nanyang Technological University, Singapore, November 2016

    Google Scholar 

  3. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  5. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9(3), 251–280 (1990). https://doi.org/10.1016/S0747-7171(08)80013-2. Computational algebraic complexity editorial

    Article  MathSciNet  MATH  Google Scholar 

  6. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the 13 International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256. PMLR, Chia Laguna Resort, Sardinia, 13–15 May 2010

    Google Scholar 

  7. Hansen, N.: The CMA evolution strategy: a tutorial (2005). https://hal.inria.fr/hal-01297037. arXiv e-prints, arXiv:1604.00772, pp. 1–39 (2016)

  8. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9, 159–195 (2001). https://doi.org/10.1162/106365601750190398

    Article  Google Scholar 

  9. Hansen, N.: The CMA evolution strategy: a comparing review. In: Lozano, J.A., Larrañaga, P., Inza, I., Bengoetxea, E. (eds.) Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms. Studies in Fuzziness and Soft Computing, vol. 192, pp. 75–102. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32494-1_4

    Chapter  Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  11. Hochreiter, S., Kolen, J.F., Kremer, S.C.: Gradient flow in recurrent nets: the difficulty of learning long term dependencies, pp. 237–243. Wiley-IEEE Press (2001)

    Google Scholar 

  12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  13. Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines Vinod Nair. In: Proceedings of ICML, vol. 27, pp. 807–814, June 2010

    Google Scholar 

  14. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)

    Google Scholar 

  15. Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-642-61068-4

    Book  MATH  Google Scholar 

  16. Shir, O., Yehudayoff, A.: On the covariance-hessian relation in evolution strategies. Theor. Comput. Sci. 801, 157–174 (2020)

    Article  MathSciNet  Google Scholar 

  17. Stanley, K., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through neuroevolution. Nat. Mach. Intell. 1, 24–35 (2019). https://doi.org/10.1038/s42256-018-0006-z

    Article  Google Scholar 

  18. Storn, R., Price, K.: Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. J. Glob. Opt. 23 (1995)

    Google Scholar 

  19. Wikipedia contributors: Embarrassingly parallel – Wikipedia, the free encyclopedia (2020), https://en.wikipedia.org/wiki/Embarrassingly_parallel. Accessed 10 Sept 2020

  20. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Zawistowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jagodziński, D., Neumann, Ł., Zawistowski, P. (2021). Deep Neuroevolution: Training Neural Networks Using a Matrix-Free Evolution Strategy. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92185-9_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92184-2

  • Online ISBN: 978-3-030-92185-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics