Fractional-order gradient descent learning of BP neural networks with Caputo derivative

doi:10.1016/j.neunet.2017.02.007

Neural Networks

Volume 89, May 2017, Pages 19-30

https://doi.org/10.1016/j.neunet.2017.02.007 Get rights and content

Abstract

Fractional calculus has been found to be a promising area of research for information processing and modeling of some physical systems. In this paper, we propose a fractional gradient descent method for the backpropagation (BP) training of neural networks. In particular, the Caputo derivative is employed to evaluate the fractional-order gradient of the error defined as the traditional quadratic energy function. The monotonicity and weak (strong) convergence of the proposed approach are proved in detail. Two simulations have been implemented to illustrate the performance of presented fractional-order BP algorithm on three small datasets and one large dataset. The numerical simulations effectively verify the theoretical observations of this paper as well.

Introduction

Fractional differential calculus has been a classical notion in mathematics for several hundreds of years. It is based on differentiation and integration of arbitrary fractional order, and as such it is a generalization of the popular integer calculus. Yet only recently it has been applied to the successful modeling of certain physical phenomena. A number of papers in the literature (Kvitsinskii, 1993, Love, 1971, McBride, 1986, Miller, 1995, Nishimoto, 1989, Oldham and Spanier, 1974) have recently reported fractional differential calculus as a model which describes much better than the conventional integer calculus on some certain selected natural phenomena. Descriptions of viscosity of liquids, diffusion of EM waves and fractional kinetics are such a few examples of dynamics of certain systems that can be successfully expressed in fractional formats. A good literature review in support of this statement is in Wang, Yu, Wen, Zhang, and Yu (2015).

As a consequence, the study of dynamics based on fractional-order differential systems has attracted considerable research interest. Novel results include solutions for chaos and stability analysis in fractional-order systems (Delavari et al., 2012, Deng and Li, 2005, Wu et al., 2008). In Wu et al. (2008), fractional multipoles, fractional solutions for Helmholtz, and fractional image processing methods were effectively studied and promising results have been produced in electromagnetics. By using the Laplace transform theory, chaos synchronization of the fractional Lü system with suitable conditions has been meticulously proved (Deng & Li, 2005). In Delavari et al. (2012), the Lyapunov direct method was extended to describe the Caputo type fractional-order nonlinear systems, and the comparison theorem of these systems was proposed by using Bihari’s and Bellman–Gronwall’s inequalities.

The fractional differential calculus has also been successfully adopted by the field of neural networks. Some remarkable research of fractional-order neural networks has been presented in Chen (2013), Chen and Chen (2016), Rakkiyappan et al., 2015, Rakkiyappan et al., 2016 and Zhang, Yu, and Wang (2015). In Chen (2013), fractional calculus was used for the Backpropagation (BP) (Rumelhart, Hinton, & Williams, 1986) algorithm for feedforward neural networks (FNNs). The simulation results demonstrated that the convergence speed based on fractional-order FNNs was much faster than integer-order FNNs. By extending the second method of Lyapunov, the Mittag-Leffler stability analysis was performed for fractional-order Hopfield neural networks (Zhang et al., 2015). In Zhang et al. (2015), the stability conditions have been used to achieve complete and quasi synchronization in the coupling case of these networks with constant or time-dependent external inputs. The global Mittag-Leffler stability and global asymptotical periodicity of the fractional-order non-autonomous neural networks were successfully investigated in Chen and Chen (2016) by using a fractional-order differential and integral inequality technique. For fractional-order complex-valued neural networks, the existence and stability analyses have been studied in detail in Rakkiyappan et al., 2015, Rakkiyappan et al., 2016. In Xiao, Zheng, Jiang, and Cao (2015), a fractional-order recurrent neural network model was studied with commensurate or incommensurate orders to exhibit the dynamic behaviors. The simulation results demonstrate that the dynamics of fractional-order systems are not invariant, in contrast to integer-order systems.

However, most research findings for fractional-order systems have been limited to studies of fully coupled recurrent networks of Hopfield type (Chen and Chen, 2016, Rakkiyappan et al., 2015, Rakkiyappan et al., 2016, Wang et al., 2014, Wu and Zen, 2013, Wu et al., 2011a, Xiao et al., 2015, Zhang et al., 2015). The vast majority of papers have been focused on studying properties of fixed points for non-integer order differential equations that describe such networks. The researched networks vary in their properties: they are with or without delay in the feedback loop, while other extensions have provided generalizations to complex-valued neurons. In contrast, this work concerns fractional-order error BP in FNNs.

Gradient descent method is commonly used to train FNNs by minimizing the error function, being the norm of a distance between the actual network output and the desired output. There exist other optional methods to implement the BP algorithm for FNNs, such as conjugate gradient, Gauss–Newton and Levenberg–Marquardt. We note that all of the above optimal methods are typically employed to train integer-order FNNs.

To our best knowledge, there is very limited literature focused on the convergence analysis of fractional-order FNNs. The existing convergence results are mainly concentrated on integer-order gradient-based FNNs. For batch mode training, the BP algorithm of FNNs with penalty was proven to be deterministic convergent (Wu, Shao, & Li, 2006). In addition, the weight sequence is uniformly bounded due to the influence of the penalty term. For incremental training, weak and strong convergence results are obtained with or without penalty terms based on different assumptions for the activation function, learning rates and the stationary points of the objective function (Shao et al., 2010, Wu et al., 2011b, Xu et al., 2009).

Only very recently, fractional neural networks have been evaluated in a broader context of training and the minimization of an objective function (Pu et al., 2015). This paper has offered a detailed analysis of fractional gradient descent learning conditions, and has supported it with initial convergence studies of minimization of quadratic energy norm and also with numerical illustration of searching for extreme points during the Fractional Adaptive Learning (FAL).

Inspired by Chen (2013) and Pu et al. (2015), we apply the fractional steepest descent algorithm to train FNNs. In particular, we employ the Caputo derivative formula to compute the fractional-order gradient of the error function with respect to the weights and obtain the deterministic convergence. The main contributions of this paper are as follows:

(1)
Under suitable conditions for activation functions and the learning rate, the monotonic decreasing property of the error function has been guaranteed.
For the activation functions of hidden and output layers, we assume that the first and second derivatives of the activation functions are uniformly bounded. This condition can be easily satisfied since the most common sigmoid activation functions are uniformly bounded on $R$ and infinitely differentiable.
(2)
The deterministic convergence of the proposed fractional-order training algorithm has been rigorously proved, which prevents the divergence behavior from a theoretical point of view. The weak convergence means that the fractional-order gradient of the error function approaches zero when the iteration number tends to infinity, while the strong convergence means the weight sequence goes to a fixed point.
(3)
Numerical simulations are reported to illustrate the effectiveness of the proposed factional-order neural networks and support the theoretical results in this paper.

Selected benchmark UCI datasets have been used to compare the performances of fractional-order vs. integer-order based FNNs. The simulations demonstrate that the training and testing accuracies for fractional-order FNNs are better than those for first-order gradient based FNNs. In addition, the monotonicity of error function and weak convergence have been figured out through the MNIST dataset.

We note that the error function in Pu et al. (2015) is quite different from that in this paper. In Pu et al. (2015), the error function is defined as the sum of the first-order extreme values of the error function and the quadratic norm of the parameters and its first-order extreme value. In this work, we consider the error function to be the squared error being the difference between the actual output and the desired output. Another difference between (Pu et al., 2015) and this paper is in the definition of the convergence itself. In Pu et al. (2015), it focuses on the regular convergence rate of the proposed algorithm, that is, on the exponential convergence. In this paper the convergence behavior of fractional-order BP algorithm is evaluated. That is, the norm of the gradient of the error function approaches zero (weak convergence). Authors also provide strong convergence proof.

The structure of the paper is as follows: in Section 2, the definitions of three commonly used fractional-order derivative are introduced. The traditional BP algorithm and our novel algorithm of fractional-order BP neural networks training based on Caputo derivative are presented in Section 3. In Section 4, the convergence results are presented, and the detailed proofs of the main convergence results are stated as Appendix. Numerical simulations are presented to illustrate the effectiveness of our results in Section 5. Finally, the paper is concluded in the last section.

Section snippets

Fractional-order derivative

Unlike the situation with integer-order derivatives, several definitions are used for fractional-order derivatives. The three most common definitions are by Grunwald–Letnikov (GL), Riemann–Liouville (RL), and Caputo (Love, 1971, McBride, 1986, Nishimoto, 1989, Oldham and Spanier, 1974).

Definition 2.1 GL Fractional-Order Derivative

The GL derivative with order $α$ of function $f (t)$ is defined as $^{GL}_{a} D_{t}^{α} f (t) = \sum_{k = 0}^{n} \frac{f^{(k)} (a) {(t - a)}^{- α + k}}{Γ (- α + k + 1)} + \frac{1}{Γ (n - α + 1)} \int_{a}^{t} {(t - τ)}^{n - α} f^{(n + 1)} (τ) d τ,$ where $^{GL}_{a} D_{t}^{α}$ is the GL fractional derivative operator, $α > 0$ , $n - 1 < α < n, n \in N$ ,

Algorithm description

Let us begin with an introduction of a network with three layers. The numbers of neurons for the input, hidden and output layers are $p$ , $n$ and 1, respectively. Suppose that the training sample set is ${x^{j}, O^{j}}_{j = 1}^{J} \subset R^{p} \times R$ , where $x^{j}$ and $O^{j}$ are the input and the corresponding ideal output of the $j$ th sample. Let $g, f : R \to R$ be given activation functions for the hidden and the output layers, separately. Let $V = {(v_{i j})}_{n \times p}$ be the weight matrix connecting the input and the hidden layers, and write $v_{i} = (v_{i 1}, v_{i 2}, \dots, v_{i}$

Main results

In this section, the convergent behavior of the proposed Caputo fractional-order BP training algorithm is described. For any $x = (x_{1}, x_{2}, \dots, x_{p}) \in R^{p}$ , we write $‖ x ‖ = \sqrt{\sum_{m = 1}^{p} x_{m}^{2}}$ , where $‖ \cdot ‖$ stands for the Euclidean norm in $R^{p}$ . Let $\hat{Φ} = {w \in Φ :_{c} D_{w}^{α} E (w) = 0}$ be the $α$ -order stationary point set of the error function $E (w)$ , where $Φ \subset R^{n (p + 1)}$ is a bounded open set and $0 < α < 1$ .

Actually, we assume that the activation functions $g$ and $f$ are bounded and infinitely differentiable on $R$ , furthermore, all of its corresponding

Experiments

To demonstrate the performance and the theoretical results of the presented algorithm, we carried out the following two simulations. The first simulation focuses on the comparison of the three different fractional-order BP algorithms: Caputo, RL and GL in Section 2. Specifically, the second one displays the observations of the proposed algorithm which based on the Caputo derivatives. In addition, the two curves of error function and the norm of gradient illustrate the theoretical conclusions

Conclusions

In this paper, we have extended the fractional steepest descent approach to BP training of FNNs. The proposed method and its theoretical analyses distinct from the existing results. We have analyzed the convergence of a Caputo fractional-order two-layers neural network trained with BP algorithm. From theoretical point of view, we have proven the monotonicity of error function and weak (strong) convergence of the proposed Caputo fractional-order BP algorithm. The numerical results support the

Acknowledgments

The authors would like to express their gratitude to the editors and anonymous reviewers for their valuable comments and suggestions which improve the quality of this paper.

References (27)

B. Chen et al.
Global $o (t^{- α})$ stability and global asymptotical periodicity for a non-autonomous fractional-order neural networks with time-varying delays
Neural Networks
(2016)
W. Deng et al.
Chaos synchronization of the fractional lü system
Physica A. Statistical Mechanics and its Applications
(2005)
R. Rakkiyappan et al.
Analysis of global $o (t^{- α})$ stability and global asymptotical periodicity for a class of fractional-order complex-valued neural networks with time varying delays
Neural Networks
(2016)
H. Wang et al.
Stability analysis of fractional-order hopfield neural networks with time delays
Neural Networks
(2014)
H. Wang et al.
Global stability analysis of fractional-order hopfield neural networks with time delay
Neurocomputing
(2015)
X. Wu et al.
Chaos in the fractional order unified system and its synchronization
Journal of the Franklin Institute
(2008)
W. Wu et al.
Convergence analysis of online gradient method for bp neural networks
Neural Networks
(2011)
A. Wu et al.
Anti-synchronization control of a class of memristive recurrent neural networks
Commnications in Nonlinear Science and Numerical Simulation
(2013)
A. Wu et al.
Dynamic behaviors of a class of memristor-based hopfield networks
Physics Letters A
(2011)
S. Zhang et al.
Mittag-leffler stability of fractional-order hopfield neural networks
Nonlinear Analysis. Hybrid Systems
(2015)

X. Chen

Application of fractional calculus in bp neural networks

(2013)

H. Delavari et al.

Stability analysis of caputo fractional-order nonlinear systems revisited

Nonlinear Dynamics

(2012)

R. Gorenflo et al.

Fractional calculus: integral and differential equations of fractional order

Mathematics

(2008)

Cited by (146)

Implementation of Caputo type fractional derivative chain rule on back propagation algorithm
2024, Applied Soft Computing
Fractional gradient computation is a challenging task for neural networks. In this study, the brief history of fractional derivation is abstracted, and the core framework of the Faà di Bruno formula is implemented to the fractional gradient computation problem. As an analytical approach to solve the chain rule problem of fractional derivatives, the use of the Faà di Bruno formula for the Caputo-type fractional derivative is proposed. When the fractional gradient is calculated using the proposed approach, the problem of determining the bounds of the formula for calculating the Caputo derivative is addressed. As a consequence, the fundamental problem with the fractional back-propagation algorithm is solved analytically, paving the way for the use of any differentiable and integrable activation function in case they are stable. The development of the algorithm and its practical implementation on the photovoltaic fault detection data-set is investigated. The reliability of the neural network metrics is also investigated using analysis of variance (ANOVA), the results are then presented to decide which are the best metrics and the best fractional order. Ordinary back-propagation, fractional back-propagation with and without L₂ regularization and momentum are presented in the results to show the advantages of using L₂ regularization and momentum in fractional order gradient. Consequently, the test metrics are reliable but cannot be separated from each other. By changing the batch size from 2 to full batches, the proposed system performs better with bigger batches, but adaptive moment estimation (ADAM) performs better with small batches. The cross validation shows the performance of back propagate ion neural networks have better performance than the ordinary neural networks. It is interesting that the best order for a specific data-set cannot be determined because it depends on the batch size, number of epochs and the cross-validation folds .
A novel gradient descent optimizer based on fractional order scheduler and its application in deep neural networks
2024, Applied Mathematical Modelling
To improve convergence speed of deep neural network trained by gradient descent methods (GDMs), this paper proposed a novel fractional order gradient descent optimizer (FOAdam) based on Caputo operator and Adam algorithm, and analyzed the impact of fractional order on convergence performance using optimization theory and simulation. Furthermore, to address the trade-off between convergence speed and precision under fixed-order conditions, a fractional order scheduler (FOS) was designed based on the Connections Cloud Model (CCM). Numerical experiments conducted on MNIST and CIFAR-10 datasets showed that FOS-FOAdam outperformed other mainstream GDMs in terms of convergence speed and precision. Finally, FOS-FOAdam was applied in recognizing and classifying image datasets of cuttings and cores at the logging site, demonstrating good prospects for practical engineering applications.
Parameter training method for convolutional neural networks based on improved Hausdorff-like derivative
2024, Expert Systems with Applications
In this paper, we propose a parameter training method in convolutional neural networks (CNNs). To introduce the orders and the cost function into the parameter optimization in CNNs, we give a definition of the Hausdorff-like derivative by analyzing the definition of the Hausdorff derivative, and a corresponding improved form is proposed. The improved Hausdorff-like (IHL) derivative is utilized in the parameter training of the back propagation in CNNs, and the orders are introduced in the back propagation for training of parameters. Therefore, we adjust the orders by judging the size of the cost function during the parameter training in CNNs, and the tuning of the orders can flexibly adjust the training speed of the parameters. An adaptive tuning approach for the orders is presented by analyzing the rule of the orders for the IHL derivative. By combining the proposed parameter training method with the adaptive moment estimation (Adam) algorithm, we propose the Adam algorithm with the IHL derivative. The experiments based on residual networks are carried out, and the experiments show that the proposed algorithm outperforms the original algorithm.
Nabla fractional distributed optimization algorithms over undirected/directed graphs
2024, Journal of the Franklin Institute
The primary focus of this paper centers on investigating unconstrained distributed optimization problems over undirected or directed graphs. One noteworthy departure from current distributed optimization algorithms in the continuous-time domain is the integration of nabla fractional calculus, which augments algorithmic performance by reducing iterative complexity. Through rigorous analysis, this paper demonstrates that the two algorithms presented converge at the Mittag–Leffler rate to the precise solution of a distributed optimization problem over a connected undirected graph or a connected balanced directed graph with strongly convex and smooth objective functions. The research findings provide valuable insights into the potential utility of nabla fractional calculus in distributed optimization problems, highlighting the possibility of enhancing the efficiency and effectiveness of distributed optimization algorithms.
Enhancing neural network classification using fractional-order activation functions
2024, AI Open
In this paper, a series of novel activation functions is presented, which is derived using the improved Riemann–Liouville conformable fractional derivative ( $^{R L}$ CFD). This study investigates the use of fractional activation functions in Multilayer Perceptron (MLP) models and their impact on the performance of classification tasks, verified using the IRIS, MNIST and FMNIST datasets. Fractional activation functions introduce a non-integer power exponent, allowing for improved capturing of complex patterns and representations. The experiment compares MLP models employing fractional activation functions, such as fractional sigmoid, hyperbolic tangent and rectified linear units, against traditional models using standard activation functions, their improved versions and existing fractional functions. The numerical studies have confirmed the theoretical observations mentioned in the paper. The findings highlight the potential usage of new functions as a valuable tool in deep learning in classification. The study suggests incorporating fractional activation functions in MLP architectures can lead to superior accuracy and robustness.
A mathematical and neural network-based hybrid technique for detecting the prostate contour from medical image data
2023, Biomedical Signal Processing and Control
Extraction of prostate contour on transrectal ultrasound (TRUS) is significant in prostate cancer diagnosis. With the low-contrast images of TRUS and the existence of imaging artifacts, detecting the prostate accurately from TRUS images becomes more and more difficult. In this paper, we present four strategies to achieve improved segmentation precision on difficult images. First, to achieve the data sequence, we adopt a variant of the principal curve-based algorithm, where a small part of prior point information is utilized as the approximate initialization. Secondly, we devise an evolution neural network to assist in finding the optimal neural network. Thirdly, we train the fractional-order-based neural network using data sequence as input, thereby decreasing the model error and increasing the precision of results during training. Finally, a smooth and explainable mathematical function of organ boundary is represented by the parameters of a fractional-order-based neural network. The performance of our method was evaluated on two clinical datasets with 1,338 TRUS prostate images using Accuracy (ACC), Dice similarity coefficient (DSC), and Jaccard similarity coefficient (OMG). Results show that our model 1) achieves better performance than the state-of-the-art segmentation models; the corresponding ACC, DSC, and OMG of our method are 95.3 ± 2.2 %, 95.9 ± 2.3 %, and 94.9 ± 2.4 %, respectively, and 2) detects unseen or vague regions satisfactorily.

View all citing articles on Scopus

^☆: This work was supported in part by the National Natural Science Foundation of China (No. 61305075), the China Postdoctoral Science Foundation (No. 2012M520624), the Natural Science Foundation of Shandong Province (Nos. ZR2013FQ004, ZR2013DM015), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20130133120014) and the Fundamental Research Funds for the Central Universities (Nos. 13CX05016A, 14CX05042A, 15CX05053A, 15CX08011A).

View full text

Fractional-order gradient descent learning of BP neural networks with Caputo derivative☆

Abstract

Introduction

Section snippets

Fractional-order derivative

Algorithm description

Main results

Experiments

Conclusions

Acknowledgments

Neural Networks

Physica A. Statistical Mechanics and its Applications

Neural Networks

Neural Networks

Neurocomputing

Journal of the Franklin Institute

Neural Networks

Commnications in Nonlinear Science and Numerical Simulation

Physics Letters A

Nonlinear Analysis. Hybrid Systems

Application of fractional calculus in bp neural networks

Stability analysis of caputo fractional-order nonlinear systems revisited

Nonlinear Dynamics

Fractional calculus: integral and differential equations of fractional order

Mathematics