Abstract
In this paper, we discuss an evolutionary method for training deep neural networks. The proposed solution is based on the Differential Evolution Strategy (DES) – an algorithm that is a crossover between Differential Evolution (DE) and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We combine this approach with Xavier’s coefficient-based population initialization, batch processing, and gradient-based mutations—the resulting weight optimizer is called neural Differential Evolution Strategy (nDES). Our algorithm yields results comparable to Adaptive Moment Estimation ADAM for a convolutional network training task (50K parameters) on the FashionMNIST dataset. We show that combining both methods results in better models than those obtained after training by either of these algorithms alone. Furthermore, nDES significantly outperforms ADAM on three classic toy recurrent neural network problems. The proposed solution is scalable in an embarrassingly parallel way. For reproducibility purposes, we provide a reference implementation written in Python.
All authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arabas, J., Jagodziński, D.: Toward a matrix-free covariance matrix adaptation evolution strategy. IEEE Trans. Evol. Comput. 24(1), 84–98 (2020)
Awad, N.H., Ali, M.Z., Liang, J.J., Qu, B.Y., Suganthan, P.N.: Problem definitions and evaluation criteria for the CEC 2017 special session and competition on single objective bound constrained real-parameter numerical optimization. Technical report, Nanyang Technological University, Singapore, November 2016
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9(3), 251–280 (1990). https://doi.org/10.1016/S0747-7171(08)80013-2. Computational algebraic complexity editorial
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the 13 International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256. PMLR, Chia Laguna Resort, Sardinia, 13–15 May 2010
Hansen, N.: The CMA evolution strategy: a tutorial (2005). https://hal.inria.fr/hal-01297037. arXiv e-prints, arXiv:1604.00772, pp. 1–39 (2016)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9, 159–195 (2001). https://doi.org/10.1162/106365601750190398
Hansen, N.: The CMA evolution strategy: a comparing review. In: Lozano, J.A., Larrañaga, P., Inza, I., Bengoetxea, E. (eds.) Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms. Studies in Fuzziness and Soft Computing, vol. 192, pp. 75–102. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32494-1_4
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hochreiter, S., Kolen, J.F., Kremer, S.C.: Gradient flow in recurrent nets: the difficulty of learning long term dependencies, pp. 237–243. Wiley-IEEE Press (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines Vinod Nair. In: Proceedings of ICML, vol. 27, pp. 807–814, June 2010
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)
Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-642-61068-4
Shir, O., Yehudayoff, A.: On the covariance-hessian relation in evolution strategies. Theor. Comput. Sci. 801, 157–174 (2020)
Stanley, K., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through neuroevolution. Nat. Mach. Intell. 1, 24–35 (2019). https://doi.org/10.1038/s42256-018-0006-z
Storn, R., Price, K.: Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. J. Glob. Opt. 23 (1995)
Wikipedia contributors: Embarrassingly parallel – Wikipedia, the free encyclopedia (2020), https://en.wikipedia.org/wiki/Embarrassingly_parallel. Accessed 10 Sept 2020
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Jagodziński, D., Neumann, Ł., Zawistowski, P. (2021). Deep Neuroevolution: Training Neural Networks Using a Matrix-Free Evolution Strategy. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-92185-9_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)