Fractional-order gradient descent learning of BP neural networks with Caputo derivative☆
Introduction
Fractional differential calculus has been a classical notion in mathematics for several hundreds of years. It is based on differentiation and integration of arbitrary fractional order, and as such it is a generalization of the popular integer calculus. Yet only recently it has been applied to the successful modeling of certain physical phenomena. A number of papers in the literature (Kvitsinskii, 1993, Love, 1971, McBride, 1986, Miller, 1995, Nishimoto, 1989, Oldham and Spanier, 1974) have recently reported fractional differential calculus as a model which describes much better than the conventional integer calculus on some certain selected natural phenomena. Descriptions of viscosity of liquids, diffusion of EM waves and fractional kinetics are such a few examples of dynamics of certain systems that can be successfully expressed in fractional formats. A good literature review in support of this statement is in Wang, Yu, Wen, Zhang, and Yu (2015).
As a consequence, the study of dynamics based on fractional-order differential systems has attracted considerable research interest. Novel results include solutions for chaos and stability analysis in fractional-order systems (Delavari et al., 2012, Deng and Li, 2005, Wu et al., 2008). In Wu et al. (2008), fractional multipoles, fractional solutions for Helmholtz, and fractional image processing methods were effectively studied and promising results have been produced in electromagnetics. By using the Laplace transform theory, chaos synchronization of the fractional Lü system with suitable conditions has been meticulously proved (Deng & Li, 2005). In Delavari et al. (2012), the Lyapunov direct method was extended to describe the Caputo type fractional-order nonlinear systems, and the comparison theorem of these systems was proposed by using Bihari’s and Bellman–Gronwall’s inequalities.
The fractional differential calculus has also been successfully adopted by the field of neural networks. Some remarkable research of fractional-order neural networks has been presented in Chen (2013), Chen and Chen (2016), Rakkiyappan et al., 2015, Rakkiyappan et al., 2016 and Zhang, Yu, and Wang (2015). In Chen (2013), fractional calculus was used for the Backpropagation (BP) (Rumelhart, Hinton, & Williams, 1986) algorithm for feedforward neural networks (FNNs). The simulation results demonstrated that the convergence speed based on fractional-order FNNs was much faster than integer-order FNNs. By extending the second method of Lyapunov, the Mittag-Leffler stability analysis was performed for fractional-order Hopfield neural networks (Zhang et al., 2015). In Zhang et al. (2015), the stability conditions have been used to achieve complete and quasi synchronization in the coupling case of these networks with constant or time-dependent external inputs. The global Mittag-Leffler stability and global asymptotical periodicity of the fractional-order non-autonomous neural networks were successfully investigated in Chen and Chen (2016) by using a fractional-order differential and integral inequality technique. For fractional-order complex-valued neural networks, the existence and stability analyses have been studied in detail in Rakkiyappan et al., 2015, Rakkiyappan et al., 2016. In Xiao, Zheng, Jiang, and Cao (2015), a fractional-order recurrent neural network model was studied with commensurate or incommensurate orders to exhibit the dynamic behaviors. The simulation results demonstrate that the dynamics of fractional-order systems are not invariant, in contrast to integer-order systems.
However, most research findings for fractional-order systems have been limited to studies of fully coupled recurrent networks of Hopfield type (Chen and Chen, 2016, Rakkiyappan et al., 2015, Rakkiyappan et al., 2016, Wang et al., 2014, Wu and Zen, 2013, Wu et al., 2011a, Xiao et al., 2015, Zhang et al., 2015). The vast majority of papers have been focused on studying properties of fixed points for non-integer order differential equations that describe such networks. The researched networks vary in their properties: they are with or without delay in the feedback loop, while other extensions have provided generalizations to complex-valued neurons. In contrast, this work concerns fractional-order error BP in FNNs.
Gradient descent method is commonly used to train FNNs by minimizing the error function, being the norm of a distance between the actual network output and the desired output. There exist other optional methods to implement the BP algorithm for FNNs, such as conjugate gradient, Gauss–Newton and Levenberg–Marquardt. We note that all of the above optimal methods are typically employed to train integer-order FNNs.
To our best knowledge, there is very limited literature focused on the convergence analysis of fractional-order FNNs. The existing convergence results are mainly concentrated on integer-order gradient-based FNNs. For batch mode training, the BP algorithm of FNNs with penalty was proven to be deterministic convergent (Wu, Shao, & Li, 2006). In addition, the weight sequence is uniformly bounded due to the influence of the penalty term. For incremental training, weak and strong convergence results are obtained with or without penalty terms based on different assumptions for the activation function, learning rates and the stationary points of the objective function (Shao et al., 2010, Wu et al., 2011b, Xu et al., 2009).
Only very recently, fractional neural networks have been evaluated in a broader context of training and the minimization of an objective function (Pu et al., 2015). This paper has offered a detailed analysis of fractional gradient descent learning conditions, and has supported it with initial convergence studies of minimization of quadratic energy norm and also with numerical illustration of searching for extreme points during the Fractional Adaptive Learning (FAL).
Inspired by Chen (2013) and Pu et al. (2015), we apply the fractional steepest descent algorithm to train FNNs. In particular, we employ the Caputo derivative formula to compute the fractional-order gradient of the error function with respect to the weights and obtain the deterministic convergence. The main contributions of this paper are as follows:
- (1)
Under suitable conditions for activation functions and the learning rate, the monotonic decreasing property of the error function has been guaranteed.
For the activation functions of hidden and output layers, we assume that the first and second derivatives of the activation functions are uniformly bounded. This condition can be easily satisfied since the most common sigmoid activation functions are uniformly bounded on and infinitely differentiable.
- (2)
The deterministic convergence of the proposed fractional-order training algorithm has been rigorously proved, which prevents the divergence behavior from a theoretical point of view. The weak convergence means that the fractional-order gradient of the error function approaches zero when the iteration number tends to infinity, while the strong convergence means the weight sequence goes to a fixed point.
- (3)
Numerical simulations are reported to illustrate the effectiveness of the proposed factional-order neural networks and support the theoretical results in this paper.
Selected benchmark UCI datasets have been used to compare the performances of fractional-order vs. integer-order based FNNs. The simulations demonstrate that the training and testing accuracies for fractional-order FNNs are better than those for first-order gradient based FNNs. In addition, the monotonicity of error function and weak convergence have been figured out through the MNIST dataset.
We note that the error function in Pu et al. (2015) is quite different from that in this paper. In Pu et al. (2015), the error function is defined as the sum of the first-order extreme values of the error function and the quadratic norm of the parameters and its first-order extreme value. In this work, we consider the error function to be the squared error being the difference between the actual output and the desired output. Another difference between (Pu et al., 2015) and this paper is in the definition of the convergence itself. In Pu et al. (2015), it focuses on the regular convergence rate of the proposed algorithm, that is, on the exponential convergence. In this paper the convergence behavior of fractional-order BP algorithm is evaluated. That is, the norm of the gradient of the error function approaches zero (weak convergence). Authors also provide strong convergence proof.
The structure of the paper is as follows: in Section 2, the definitions of three commonly used fractional-order derivative are introduced. The traditional BP algorithm and our novel algorithm of fractional-order BP neural networks training based on Caputo derivative are presented in Section 3. In Section 4, the convergence results are presented, and the detailed proofs of the main convergence results are stated as Appendix. Numerical simulations are presented to illustrate the effectiveness of our results in Section 5. Finally, the paper is concluded in the last section.
Section snippets
Fractional-order derivative
Unlike the situation with integer-order derivatives, several definitions are used for fractional-order derivatives. The three most common definitions are by Grunwald–Letnikov (GL), Riemann–Liouville (RL), and Caputo (Love, 1971, McBride, 1986, Nishimoto, 1989, Oldham and Spanier, 1974). Definition 2.1 GL Fractional-Order Derivative The GL derivative with order of function is defined as where is the GL fractional derivative operator, , ,
Algorithm description
Let us begin with an introduction of a network with three layers. The numbers of neurons for the input, hidden and output layers are , and 1, respectively. Suppose that the training sample set is , where and are the input and the corresponding ideal output of the th sample. Let be given activation functions for the hidden and the output layers, separately. Let be the weight matrix connecting the input and the hidden layers, and write
Main results
In this section, the convergent behavior of the proposed Caputo fractional-order BP training algorithm is described. For any , we write , where stands for the Euclidean norm in . Let be the -order stationary point set of the error function , where is a bounded open set and .
Actually, we assume that the activation functions and are bounded and infinitely differentiable on , furthermore, all of its corresponding
Experiments
To demonstrate the performance and the theoretical results of the presented algorithm, we carried out the following two simulations. The first simulation focuses on the comparison of the three different fractional-order BP algorithms: Caputo, RL and GL in Section 2. Specifically, the second one displays the observations of the proposed algorithm which based on the Caputo derivatives. In addition, the two curves of error function and the norm of gradient illustrate the theoretical conclusions
Conclusions
In this paper, we have extended the fractional steepest descent approach to BP training of FNNs. The proposed method and its theoretical analyses distinct from the existing results. We have analyzed the convergence of a Caputo fractional-order two-layers neural network trained with BP algorithm. From theoretical point of view, we have proven the monotonicity of error function and weak (strong) convergence of the proposed Caputo fractional-order BP algorithm. The numerical results support the
Acknowledgments
The authors would like to express their gratitude to the editors and anonymous reviewers for their valuable comments and suggestions which improve the quality of this paper.
References (27)
- et al.
Global stability and global asymptotical periodicity for a non-autonomous fractional-order neural networks with time-varying delays
Neural Networks
(2016) - et al.
Chaos synchronization of the fractional lü system
Physica A. Statistical Mechanics and its Applications
(2005) - et al.
Analysis of global stability and global asymptotical periodicity for a class of fractional-order complex-valued neural networks with time varying delays
Neural Networks
(2016) - et al.
Stability analysis of fractional-order hopfield neural networks with time delays
Neural Networks
(2014) - et al.
Global stability analysis of fractional-order hopfield neural networks with time delay
Neurocomputing
(2015) - et al.
Chaos in the fractional order unified system and its synchronization
Journal of the Franklin Institute
(2008) - et al.
Convergence analysis of online gradient method for bp neural networks
Neural Networks
(2011) - et al.
Anti-synchronization control of a class of memristive recurrent neural networks
Commnications in Nonlinear Science and Numerical Simulation
(2013) - et al.
Dynamic behaviors of a class of memristor-based hopfield networks
Physics Letters A
(2011) - et al.
Mittag-leffler stability of fractional-order hopfield neural networks
Nonlinear Analysis. Hybrid Systems
(2015)
Application of fractional calculus in bp neural networks
Stability analysis of caputo fractional-order nonlinear systems revisited
Nonlinear Dynamics
Fractional calculus: integral and differential equations of fractional order
Mathematics
Cited by (146)
Implementation of Caputo type fractional derivative chain rule on back propagation algorithm
2024, Applied Soft ComputingA novel gradient descent optimizer based on fractional order scheduler and its application in deep neural networks
2024, Applied Mathematical ModellingParameter training method for convolutional neural networks based on improved Hausdorff-like derivative
2024, Expert Systems with ApplicationsNabla fractional distributed optimization algorithms over undirected/directed graphs
2024, Journal of the Franklin InstituteA mathematical and neural network-based hybrid technique for detecting the prostate contour from medical image data
2023, Biomedical Signal Processing and Control
- ☆
This work was supported in part by the National Natural Science Foundation of China (No. 61305075), the China Postdoctoral Science Foundation (No. 2012M520624), the Natural Science Foundation of Shandong Province (Nos. ZR2013FQ004, ZR2013DM015), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20130133120014) and the Fundamental Research Funds for the Central Universities (Nos. 13CX05016A, 14CX05042A, 15CX05053A, 15CX08011A).