An online gradient method with momentum for two-layer feedforward neural networks

https://doi.org/10.1016/j.amc.2009.02.038Get rights and content

Abstract

An online gradient method with momentum for two-layer feedforward neural networks is considered. The momentum coefficient is chosen in an adaptive manner to accelerate and stabilize the learning procedure of the network weights. Corresponding convergence results are proved, that is, the weak convergence result is proved under the uniformly boundedness assumption of the activation function and its derivatives, moreover, if the number of elements of the stationary point set for the error function is finite, then strong convergence result holds.

Introduction

Feedforward neural networks (FNN) have been widely used in applications, which are often trained by use of the gradient method [3], [4], [6], [7], [17], [18], and as a simple example, the convergence for two-layer feedforward neural networks is discussed in [6], [8], [15], [16]. To speed up and stabilize the training iteration procedure for the gradient method, a momentum term [12], [13] is often added to the increment formula for the weights, in which the present weight updating increment is a combination of the present gradient of the error function and the previous weight updating increment. Many researchers have developed the theory about momentum and extended its applications, see, e.g. [1], [2], [5], [9], [10], [11], [14], [20], [23].

In [21], some convergence results are given for a two-layer feedforward neural network, where the learning fashion of training examples is batch learning. These results are of global nature in that they are valid for any arbitrarily given initial values of the weights. The key for the convergence analysis is the monotonicity of the error function during the learning procedure, which is proved under the uniformly boundedness assumption of the activation function and its derivatives. In [22], we consider an online gradient method with momentum (OGM in short) for a two-layer feedforward neural network and obtain both the weak and strong convergence results. However, in [21], [22], in order to obtain the strong convergence we assume the error function is uniformly convex, which is a little intense. And in [22] we always assume the training examples are linear independent. The linear independence assumption on the training examples is satisfied in some practical models, where it needs the dimension n of the training examples greater than the number J of the training examples. However, if the number J of the training examples is very large, e.g. J is greater than n, then the linear independence assumption on the training examples cannot be satisfied.

In this paper, we consider an OGM for a two-layer feedforward neural network without the assumption that the training examples are linear independent. We also try to discuss the strong convergence for the OGM without the assumption that the error function is uniformly convex.

The rest of the paper is organized as follows. In Section 2 we introduce the online gradient method with momentum and discuss its convergence conditions, and with these conditions we obtain the corresponding weak convergence and strong convergence. In Section 3 we are devoted to proving Theorem 2.4, which is the main result of this paper. Finally, some conclusions are drawn in Section 4.

Section snippets

OGM and its convergence

For a given set of training examples {ξj,Oj}j=1JRn×R, we describe the neural network approximation problem as follows. Let g:RR be a given smooth activation function. For a choice of the weight vector wRn, the actual output of the neural network isζj=g(w·ξj),j=1,,J,where w·ξj represents the inner product. Our task is to choose the weight w such that the difference |Oj-ζj| is as small as possible. A simple and popular approach is to minimize the quadratic error functionE(w):=12j=1J(Oj-ζj)2=1

Proof of Theorem 2.4

To show Theorem 2.4, including Lemma 2.3, we also need the other preliminary lemmas by follows.

Using Taylor’s formula we expand gj(w(m+1)J·ξj) at wmJ·ξj asgj(w(m+1)J·ξj)=gj(wmJ·ξj)+gj(wmJ·ξj)(w(m+1)J-wmJ)·ξj+12gj(tm,j)[(w(m+1)J-wmJ)·ξj]2=gj(wmJ·ξj)+gj(wmJ·ξj)(w(m+1)J-wmJ)·ξj+ρm,j,where tm,j lies in between wmJ·ξj and w(m+1)J·ξj, andρm,j=12gj(tm,j)(w(m+1)J-wmJ)·ξj2.From (2.6), (3.1) we getE(w(m+1)J)-E(wmJ)=j=1Jgj(wmJ·ξj)(w(m+1)J-wmJ)·ξj+j=1Jρm,j.Noticingw(m+1)J-wmJ=k=1JΔwmJ+k=k=1J(τm,kΔw

Conclusions

In this paper, we consider an online gradient method with momentum for two-layer feedforward neural networks. The momentum coefficient is chosen in an adaptive manner to accelerate and stabilize the learning procedure of the network weights. We do not restrict the training examples to be linear independent, and give up the assumption that the error function is uniformly convex, which is a little intense in the literature. With the assumption that the activation function and its derivatives |g(t)

Acknowledgement

The author would like to thank Professor Wei Wu for his lots of valuable suggestions on the topic of this paper, also thanks the anonymous referees for their valuable comments and suggestions on the revision of this paper.

References (23)

  • Z. Luo

    On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks

    Neural Computation

    (1991)
  • Cited by (52)

    • Artificial intelligence based modelling and multi-objective optimization of vinyl chloride monomer (VCM) plant to strike a balance between profit, energy utilization and environmental degradation

      2022, Journal of the Indian Chemical Society
      Citation Excerpt :

      These elements are described in further depth below. The fitness function used in this article [5] has the following formula: GWO, a new stochastic and metaheuristic optimization technique, was introduced by Mirjalili et al. [23].

    • Multiplanar reconstruction with incomplete data via enhanced fuzzy radial basis function neural networks

      2020, Biomedical Signal Processing and Control
      Citation Excerpt :

      The next step is neural network training - in the field of feedforward neural networks, the best known training method is the error backpropagation algorithm (EBPA) [35]. Although new methods have evolved from EBPA and have been proposed by researchers [36,37], the problems of slow convergence and a high intensity dependence on the raw data remain [38]. Various heuristic optimization methods have been investigated for training feedforward neural networks [39–42].

    View all citing articles on Scopus

    This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Y606009.

    View full text