Elsevier

Neural Networks

Volume 89, May 2017, Pages 19-30
Neural Networks

Fractional-order gradient descent learning of BP neural networks with Caputo derivative

https://doi.org/10.1016/j.neunet.2017.02.007Get rights and content

Abstract

Fractional calculus has been found to be a promising area of research for information processing and modeling of some physical systems. In this paper, we propose a fractional gradient descent method for the backpropagation (BP) training of neural networks. In particular, the Caputo derivative is employed to evaluate the fractional-order gradient of the error defined as the traditional quadratic energy function. The monotonicity and weak (strong) convergence of the proposed approach are proved in detail. Two simulations have been implemented to illustrate the performance of presented fractional-order BP algorithm on three small datasets and one large dataset. The numerical simulations effectively verify the theoretical observations of this paper as well.

Introduction

Fractional differential calculus has been a classical notion in mathematics for several hundreds of years. It is based on differentiation and integration of arbitrary fractional order, and as such it is a generalization of the popular integer calculus. Yet only recently it has been applied to the successful modeling of certain physical phenomena. A number of papers in the literature (Kvitsinskii, 1993, Love, 1971, McBride, 1986, Miller, 1995, Nishimoto, 1989, Oldham and Spanier, 1974) have recently reported fractional differential calculus as a model which describes much better than the conventional integer calculus on some certain selected natural phenomena. Descriptions of viscosity of liquids, diffusion of EM waves and fractional kinetics are such a few examples of dynamics of certain systems that can be successfully expressed in fractional formats. A good literature review in support of this statement is in Wang, Yu, Wen, Zhang, and Yu (2015).

As a consequence, the study of dynamics based on fractional-order differential systems has attracted considerable research interest. Novel results include solutions for chaos and stability analysis in fractional-order systems (Delavari et al., 2012, Deng and Li, 2005, Wu et al., 2008). In Wu et al. (2008), fractional multipoles, fractional solutions for Helmholtz, and fractional image processing methods were effectively studied and promising results have been produced in electromagnetics. By using the Laplace transform theory, chaos synchronization of the fractional Lü system with suitable conditions has been meticulously proved (Deng & Li, 2005). In Delavari et al. (2012), the Lyapunov direct method was extended to describe the Caputo type fractional-order nonlinear systems, and the comparison theorem of these systems was proposed by using Bihari’s and Bellman–Gronwall’s inequalities.

The fractional differential calculus has also been successfully adopted by the field of neural networks. Some remarkable research of fractional-order neural networks has been presented in Chen (2013), Chen and Chen (2016), Rakkiyappan et al., 2015, Rakkiyappan et al., 2016 and Zhang, Yu, and Wang (2015). In Chen (2013), fractional calculus was used for the Backpropagation (BP) (Rumelhart, Hinton, & Williams, 1986) algorithm for feedforward neural networks (FNNs). The simulation results demonstrated that the convergence speed based on fractional-order FNNs was much faster than integer-order FNNs. By extending the second method of Lyapunov, the Mittag-Leffler stability analysis was performed for fractional-order Hopfield neural networks (Zhang et al., 2015). In Zhang et al. (2015), the stability conditions have been used to achieve complete and quasi synchronization in the coupling case of these networks with constant or time-dependent external inputs. The global Mittag-Leffler stability and global asymptotical periodicity of the fractional-order non-autonomous neural networks were successfully investigated in Chen and Chen (2016) by using a fractional-order differential and integral inequality technique. For fractional-order complex-valued neural networks, the existence and stability analyses have been studied in detail in Rakkiyappan et al., 2015, Rakkiyappan et al., 2016. In Xiao, Zheng, Jiang, and Cao (2015), a fractional-order recurrent neural network model was studied with commensurate or incommensurate orders to exhibit the dynamic behaviors. The simulation results demonstrate that the dynamics of fractional-order systems are not invariant, in contrast to integer-order systems.

However, most research findings for fractional-order systems have been limited to studies of fully coupled recurrent networks of Hopfield type (Chen and Chen, 2016, Rakkiyappan et al., 2015, Rakkiyappan et al., 2016, Wang et al., 2014, Wu and Zen, 2013, Wu et al., 2011a, Xiao et al., 2015, Zhang et al., 2015). The vast majority of papers have been focused on studying properties of fixed points for non-integer order differential equations that describe such networks. The researched networks vary in their properties: they are with or without delay in the feedback loop, while other extensions have provided generalizations to complex-valued neurons. In contrast, this work concerns fractional-order error BP in FNNs.

Gradient descent method is commonly used to train FNNs by minimizing the error function, being the norm of a distance between the actual network output and the desired output. There exist other optional methods to implement the BP algorithm for FNNs, such as conjugate gradient, Gauss–Newton and Levenberg–Marquardt. We note that all of the above optimal methods are typically employed to train integer-order FNNs.

To our best knowledge, there is very limited literature focused on the convergence analysis of fractional-order FNNs. The existing convergence results are mainly concentrated on integer-order gradient-based FNNs. For batch mode training, the BP algorithm of FNNs with penalty was proven to be deterministic convergent (Wu, Shao, & Li, 2006). In addition, the weight sequence is uniformly bounded due to the influence of the penalty term. For incremental training, weak and strong convergence results are obtained with or without penalty terms based on different assumptions for the activation function, learning rates and the stationary points of the objective function (Shao et al., 2010, Wu et al., 2011b, Xu et al., 2009).

Only very recently, fractional neural networks have been evaluated in a broader context of training and the minimization of an objective function (Pu et al., 2015). This paper has offered a detailed analysis of fractional gradient descent learning conditions, and has supported it with initial convergence studies of minimization of quadratic energy norm and also with numerical illustration of searching for extreme points during the Fractional Adaptive Learning (FAL).

Inspired by Chen (2013) and Pu et al. (2015), we apply the fractional steepest descent algorithm to train FNNs. In particular, we employ the Caputo derivative formula to compute the fractional-order gradient of the error function with respect to the weights and obtain the deterministic convergence. The main contributions of this paper are as follows:

  • (1)

    Under suitable conditions for activation functions and the learning rate, the monotonic decreasing property of the error function has been guaranteed.

  • For the activation functions of hidden and output layers, we assume that the first and second derivatives of the activation functions are uniformly bounded. This condition can be easily satisfied since the most common sigmoid activation functions are uniformly bounded on R and infinitely differentiable.

  • (2)

    The deterministic convergence of the proposed fractional-order training algorithm has been rigorously proved, which prevents the divergence behavior from a theoretical point of view. The weak convergence means that the fractional-order gradient of the error function approaches zero when the iteration number tends to infinity, while the strong convergence means the weight sequence goes to a fixed point.

  • (3)

    Numerical simulations are reported to illustrate the effectiveness of the proposed factional-order neural networks and support the theoretical results in this paper.

Selected benchmark UCI datasets have been used to compare the performances of fractional-order vs. integer-order based FNNs. The simulations demonstrate that the training and testing accuracies for fractional-order FNNs are better than those for first-order gradient based FNNs. In addition, the monotonicity of error function and weak convergence have been figured out through the MNIST dataset.

We note that the error function in Pu et al. (2015) is quite different from that in this paper. In Pu et al. (2015), the error function is defined as the sum of the first-order extreme values of the error function and the quadratic norm of the parameters and its first-order extreme value. In this work, we consider the error function to be the squared error being the difference between the actual output and the desired output. Another difference between (Pu et al., 2015) and this paper is in the definition of the convergence itself. In Pu et al. (2015), it focuses on the regular convergence rate of the proposed algorithm, that is, on the exponential convergence. In this paper the convergence behavior of fractional-order BP algorithm is evaluated. That is, the norm of the gradient of the error function approaches zero (weak convergence). Authors also provide strong convergence proof.

The structure of the paper is as follows: in Section  2, the definitions of three commonly used fractional-order derivative are introduced. The traditional BP algorithm and our novel algorithm of fractional-order BP neural networks training based on Caputo derivative are presented in Section  3. In Section  4, the convergence results are presented, and the detailed proofs of the main convergence results are stated as Appendix. Numerical simulations are presented to illustrate the effectiveness of our results in Section  5. Finally, the paper is concluded in the last section.

Section snippets

Fractional-order derivative

Unlike the situation with integer-order derivatives, several definitions are used for fractional-order derivatives. The three most common definitions are by Grunwald–Letnikov (GL), Riemann–Liouville (RL), and Caputo (Love, 1971, McBride, 1986, Nishimoto, 1989, Oldham and Spanier, 1974).

Definition 2.1 GL Fractional-Order Derivative

The GL derivative with order α of function f(t) is defined as GLaDtαf(t)=k=0nf(k)(a)(ta)α+kΓ(α+k+1)+1Γ(nα+1)at(tτ)nαf(n+1)(τ)dτ, where GLaDtα is the GL fractional derivative operator, α>0,   n1<α<n,nN,

Algorithm description

Let us begin with an introduction of a network with three layers. The numbers of neurons for the input, hidden and output layers are p, n and 1, respectively. Suppose that the training sample set is {xj,Oj}j=1JRp×R, where xj and Oj are the input and the corresponding ideal output of the jth sample. Let  g,f:RR be given activation functions for the hidden and the output layers, separately. Let V=(vij)n×p be the weight matrix connecting the input and the hidden layers, and write vi=(vi1,vi2,,vi

Main results

In this section, the convergent behavior of the proposed Caputo fractional-order BP training algorithm is described. For any x=(x1,x2,,xp)Rp, we write x=m=1pxm2, where stands for the Euclidean norm in Rp. Let Φ̂={wΦ:cDwαE(w)=0} be the α-order stationary point set of the error function E(w), where ΦRn(p+1) is a bounded open set and 0<α<1.

Actually, we assume that the activation functions g and f are bounded and infinitely differentiable on R, furthermore, all of its corresponding

Experiments

To demonstrate the performance and the theoretical results of the presented algorithm, we carried out the following two simulations. The first simulation focuses on the comparison of the three different fractional-order BP algorithms: Caputo, RL and GL in Section  2. Specifically, the second one displays the observations of the proposed algorithm which based on the Caputo derivatives. In addition, the two curves of error function and the norm of gradient illustrate the theoretical conclusions

Conclusions

In this paper, we have extended the fractional steepest descent approach to BP training of FNNs. The proposed method and its theoretical analyses distinct from the existing results. We have analyzed the convergence of a Caputo fractional-order two-layers neural network trained with BP algorithm. From theoretical point of view, we have proven the monotonicity of error function and weak (strong) convergence of the proposed Caputo fractional-order BP algorithm. The numerical results support the

Acknowledgments

The authors would like to express their gratitude to the editors and anonymous reviewers for their valuable comments and suggestions which improve the quality of this paper.

References (27)

  • X. Chen

    Application of fractional calculus in bp neural networks

    (2013)
  • H. Delavari et al.

    Stability analysis of caputo fractional-order nonlinear systems revisited

    Nonlinear Dynamics

    (2012)
  • R. Gorenflo et al.

    Fractional calculus: integral and differential equations of fractional order

    Mathematics

    (2008)
  • Cited by (146)

    View all citing articles on Scopus

    This work was supported in part by the National Natural Science Foundation of China (No. 61305075), the China Postdoctoral Science Foundation (No. 2012M520624), the Natural Science Foundation of Shandong Province (Nos. ZR2013FQ004, ZR2013DM015), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20130133120014) and the Fundamental Research Funds for the Central Universities (Nos. 13CX05016A, 14CX05042A, 15CX05053A, 15CX08011A).

    View full text