Abstract
We study the behaviour in zero of the derivatives of the cost function used when training non-linear neural networks. It is shown that a fair number of first, second and higher order derivatives vanish in zero, validating the belief that 0 is a peculiar and potentially harmful location. These calculations are related to practical and theoretical aspects of neural networks training.
Similar content being viewed by others
References
R. Battiti, “First-and second-order methods for learning: Between steepest descent and Newton's method”, Neural Computation, Vol. 4, No. 2, pp. 141–166, March 1992.
C. Goutte, Statistical learning and regularisation in regression, Ph.D. Thesis, Université Paris 6, Paris, 1997.
A. Krogh and J.A. Hertz, “A simple weight decay can improve generalization”, in J. E. Moody, S. J. Hanson and R. P. Lippman (eds), Advances in Neural Information Processing Systems, Vol. 4, 1992.
Y. LeCun, Modèles connexionnistes de l'apprentissage, Ph.D. Thesis, Université Paris 6, Paris, 1987.
M. Møller, “A scaled conjugate gradient algorithm for fast supervised learning”, Neural Networks, Vol. 6, No. 4, pp. 525–533, 1993.
D. Rumelhart, G. Hinton and R. Williams, “Learning internal representation by error propagation”, in D. Rumelhart and J. McClellan (eds), Parallel Distributed Processing: Exploring the microstructure of cognition, Vol. 1, pp. 318–362, MIT Press, 1986.
A.N. Tikhonov and V.Y. Arsenin, Solution of Ill-Posed Problems, Winston: Washington, D.C., 1977.
V.N. Vapnik, The Nature of Statistical Learning Theory, Springer: New York, 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Goutte, C. Behaviour in 0 of the Neural Networks Training Cost. Neural Processing Letters 8, 107–116 (1998). https://doi.org/10.1023/A:1009684310458
Issue Date:
DOI: https://doi.org/10.1023/A:1009684310458