Abstract
Two backpropagation algorithms with momentum for feedforward neural networks with a single hidden layer are considered. It is assumed that the training samples are supplied to the network in a cyclic or an almost-cyclic fashion in the learning procedure. A re-start strategy for the momentum is adopted such that the momentum coefficient is set to zero at the beginning of each training cycle. Corresponding weak and strong convergence results are presented, respectively. The convergence conditions on the learning rate, the momentum coefficient and the activation functions are much relaxed compared with those of the existing results. Numerical examples are implemented to support our theoretical results and demonstrate that ACMFNN does much better than CMFNN on both convergence speed and generalization ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Tsinghua University Press and Prentice Hall (2001)
Rumelhart, D.E., McClelland, J.L.: Parall Distributed Processing-Explorations in the Microstructure of Cognition. MIT Press, Cambridge (1986)
de Oliveira, E.A., Alamino, R.C.: Performance of the Bayesian Online Algorithm for the Perceptron. IEEE Trans. Neural Networ. 18, 902–905 (2007)
Heskes, T., Wiegerinck, W.: A Theoretical Comparison of Batch-Mode, On-Line, Cyclic, and Almost-Cyclic Learning. IEEE T Neural Networ. 7, 919–925 (1996)
Wilson, D.R., Martinez, T.R.: The general inefficiency of batch training for gradient descent learning. Neural Networks 16, 1429–1451 (2003)
Terence, D.S.: Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2, 459–473 (1989)
Finnoff, W.: Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. Neural Computation 6, 242–254 (1994)
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison Wesley, Redwood City (1991)
Becker, S., Le Cun, Y.: Improving the convergence of back-propagation learning with second-order methods. In: Proc. of the 1988 Conneciiontst Models Summer School, San Mateo, pp. 29–37 (1989)
Hagan, M.T., Demuth, H.B., Beale, M.H.: Neural Network Design. PWS, Boston (1996)
Liang, Y.C., Feng, D.P., Lee, H.P.: Successive Approximation Training Algorithm for Feedforward Neural Networks. Neurocomputing 42, 11–322 (2002)
Chakraborty, D., Pal, N.R.: A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning. IEEE Trans. Neural Networ. 14, 1–14 (2003)
Fine, T.L., Mukherjee, S.: Parameter Convergence and Learning Curves for Neural Networks. Neural Computation 11, 747–769 (1999)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Tadic, V., Stankovic, S.: Learning in neural networks by normalized stochastic gradient algorithm: Local convergence. In: Proceedings of the 5th Seminar Neural Networks Application Electronic Engineering, Yugoslavia (2000)
Wu, W., Shao, H.M., Qu, D.: Strong Convergence of Gradient Methods for BP Networks Training. In: Proc. Int. Conf. Neural Networks & Brains, pp. 332–334 (2005)
Wu, W., Feng, G.R., Li, X.: Training multilayer perceptrons via minimization of sum of ridge functions. Advances in Computational Mathematics 17, 331–347 (2002)
Bhaya, A., Kaszkurewicz, E.: Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Networks 17, 65–71 (2004)
Torii, M., Hagan, M.T.: Stability of steepest descent with momentum for quadratic functions. IEEE Trans. Neural Networks 13, 752–756 (2002)
Wu, W., Zhang, N.M., Li, Z.X.: Convergence of gradient method with momentum for back-propagation neural networks. Journal of Computational Mathematics 26, 613–623 (2008)
Zhang, N.M., Wu, W., Zheng, G.F.: Gonvergence of gradient method with momentum for two-layer feedforward neural networks. IEEE Trans. Neural Networks 17, 522–525 (2006)
Zhang, N.M.: Deterministic Convergence of an Online Gradient Method with Momentum. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS, vol. 4113, pp. 94–105. Springer, Heidelberg (2006)
Zhang, N.M.: An online gradient method with momentum for two-layer feedforward neural networks. Applied Mathematics and Computation 212, 488–498 (2009)
Wang, J., Yang, J., Wu, W.: Convergence of Cyclic and Almost-Cyclic Learning with Momentum for Feedforward Neural Networks. IEEE Trans. Neural Networks 22, 1297–1306 (2011)
Powell, M.J.D.: Restart procedure for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977)
Ren, Y.J.: Numerical analysis and the implementations based on Matlab. Higer Education Press, Beijing (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Wu, W., Zurada, J.M. (2012). Computational Properties of Cyclic and Almost-Cyclic Learning with Momentum for Feedforward Neural Networks. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds) Advances in Neural Networks – ISNN 2012. ISNN 2012. Lecture Notes in Computer Science, vol 7367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31346-2_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-31346-2_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31345-5
Online ISBN: 978-3-642-31346-2
eBook Packages: Computer ScienceComputer Science (R0)