ABSTRACT
Training neural networks with faster gradient methods brings them to the edge of stability, proximity to which improves their generalization capability. However, it is not clear how to stably approach the edge. We propose a new activation function to model inner processes inside neurons with single-species population dynamics. The function induces essential dynamics in neural networks with a growth and harvest rate to improve their generalization capability.
- Sanjeev Arora, Nadav Cohen, Noah Golowich, and Wei Hu. 2019. A convergence analysis of gradient descent for deep linear neural networks. In ICLR.Google Scholar
- Garrett Bingham, William Macke, and Risto Miikkulainen. 2020. Evolutionary optimization of deep learning activation functions. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO '20). 289--296.Google ScholarDigital Library
- Valentin De Bortoli, Alain Durmus, Xavier Fontaine, and Umut Simsekli. 2020. Quantitative Propagation of Chaos for SGD in Wide Neural Networks. In NeurIPS.Google Scholar
- Jeremy Cohen, Simran Kaur, Yuanzhi Li, J Zico Kolter, and Ameet Talwalkar. 2021. Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability. In ICLR.Google Scholar
- Aymeric Dieuleveut, Nicolas Flammarion, and Francis Bach. 2017. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. Journal of Machine Learning Research 18 (2017), 1--51.Google Scholar
- P. De Felice, C. Marangi, G. Nardulli, G. Pasquariello, and L. Tedesco. 1993. Dynamics of neural networks with non-monotone activation function. Network: Computation in Neural Systems 4, 1 (1993), 1--9.Google ScholarCross Ref
- William G. Gray and Genetha A. Gray (Eds.). 2017. Introduction to Environmental Modeling. Cambridge University Press, Cambridge, UK.Google Scholar
- T. Legović. 2016. Dynamic population models. In Ecological model types, S.E. Jorgensen (Ed.). Elsevier, 39--63.Google Scholar
- Zhiyuan Li and Sanjeev Arora. 2020. An Exponential Learning Rate Schedule for Deep Learning. In ICLR.Google Scholar
- Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, and Jascha Sohl-Dickstein. 2017. Deep information propagation. In ICLR.Google Scholar
- Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom. 2018. Neural Arithmetic Logic Units. In NeurIPS.Google Scholar
- Kees van den Doel and Uri Ascher. 2012. The Chaotic Nature of Faster Gradient Descent Methods. Journal of Scientific Computing 51 (2012), 560--581.Google ScholarDigital Library
- Rui Wang, Danielle Robinson, Christos Faloutsos, Yuyang Wang, and Rose Yu. 2020. Learning dynamical systems requires rethinking generalization. In 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning.Google Scholar
Index Terms
- Growth and harvest induce essential dynamics in neural networks
Recommendations
Improving generalization of neural networks using multilayer perceptron discriminants
Special issue: Advances in control and computer engineeringThis paper discusses the empirical evaluation of improving generalization performance of neural networks by systematic treatment of training and test failures. As a result of systematic treatment of failures, multilayer perceptron (MLP) discriminants ...
Neural networks letter: Stability of delayed neural networks with time-varying impulses
This paper addresses the stability problem of a class of delayed neural networks with time-varying impulses. One important feature of the time-varying impulses is that both the stabilizing and destabilizing impulses are considered simultaneously. Based ...
Dynamics of complex-valued fractional-order neural networks
The dynamics of complex-valued fractional-order neuronal networks are investigated, focusing on stability, instability and Hopf bifurcations. Sufficient conditions for the asymptotic stability and instability of a steady state of the network are derived,...
Comments