Learning efficiency improvement of back-propagation algorithm by error saturation prevention method

doi:10.1016/S0925-2312(00)00352-0

Neurocomputing

Volume 41, Issues 1–4, October 2001, Pages 125-143

https://doi.org/10.1016/S0925-2312(00)00352-0 Get rights and content

Abstract

Back-propagation (BP) algorithm is currently the most widely used learning algorithm in artificial neural networks. With proper selection of feed-forward neural network architecture, it is capable of approximating most problems with high accuracy and generalization ability. However, the slow convergence is a serious problem when using this well-known BP learning algorithm in many applications. As a result, many researchers take effort to improve the learning efficiency of BP algorithm by various enhancements. In this research, we consider that the error saturation (ES) condition, which is caused by the use of gradient descent method, will greatly slow down the learning speed of BP algorithm. Thus, in this paper, we will analyze the causes of the ES condition in the output layer. An error saturation prevention (ESP) function is then proposed to prevent the nodes in the output layer from the ES condition. We also apply this method to the nodes in hidden layers to adjust the learning terms. By the proposed methods, we can not only improve the learning efficiency by the ES condition prevention but also maintain the semantic meaning of the energy function. Finally, some simulations are given to show the workings of our proposed method.

Introduction

The back-propagation (BP) algorithm [25] is a widely used learning algorithm in artificial neural networks [11], [12], [14], [18]. It works well for many problems (e.g., classification or pattern recognition, etc.) [2], [3], [9], [25]. However, it suffers two critical drawbacks from the use of gradient-descent method: one is the learning process often traps into local minimum and another is its slow learning speed. As a result, there are many researches on them [4], [5], [15], especially for the learning efficiency improvement by preventing the premature saturation (PS) phenomenon [16], [30]. The PS means that outputs of the artificial neural networks temporarily trap into high error level during the early learning stage. In the learning issues of PS phenomenon, researchers have designed many usable modifications to solve this phenomenon [7], [17], [31]. However, the proposed methods are limited to many assumptions and seem to be complex (will be detailed in the next subsection). In this study, the error saturation (ES) condition [17] is considered as the main cause of PS phenomenon. The ES condition means that the nodes in each layer of artificial neural network models have outputs near the extreme value 0 or 1, but with obvious differences between the desired and actual outputs. Consequently, the learning term (signal) will be very small for the increment of weights or others parameters [23]. Therefore, we will use the error saturation prevention (ESP) method to overcome the PS phenomenon and thus the learning convergence will speed up and the local minimum problem will be relieved. Besides, we keep the semantic meaning of used mean-square-error (MSE) function to rationalize the evaluation of error criterion.

The slow learning speed of conventional BP algorithm for training feed-forward multi-layer neural network models is due to the fact that back propagation is a gradient descent method [25]. Many researchers have taken efforts on the efficiency improvement issue of BP algorithm and we will summarize them as below.

In the research of Lee et al. [16], [17], they analyzed the slow learning convergence caused by the PS phenomenon. Under this situation, the learning term will be too small to lead the training weights adaptation [17], [30], [31]. In their presentation, they found that the PS phenomenon during early epochs of learning procedure is caused by inappropriate setting of initial weights. Also, the probability of PS can be derived by the maximum value of initial weights, the number of nodes in each layer, and the maximum slope of the sigmoid activation function. After deriving the PS probability function, they avoid the PS phenomenon by properly setting initial weights. However, they derived the PS probability equation based on the assumption that the initial weights and thresholds both are in uniform distribution. Besides, some limitations must be set during learning. In real-world applications, however, these uniform distribution requirement and limitations of learning parameters may not be easily achieved.

Vetela and Reifman [30], [31] analyzed the causes of PS by dividing the learning process into three stages: beginning of saturation stage, saturation plateau stage and complete recovery stage. They proposed four necessary conditions for the occurrence of PS phenomenon by gradient of weights, the momentum terms, and extreme error condition. After constructing the PS mechanism and necessary conditions, they prevent anyone of the four conditions to be satisfied and thus avoid the slow learning condition. Actually, they think that the momentum term plays the leading role in the occurrence of PS phenomenon. Therefore, they improve the convergence speed by properly setting momentum term in learning process. This method is efficient to prevent the PS phenomenon in the early learning stage. However, they also suffer from some limitations. For constructing the mechanism of PS phenomenon, they must design a scenario which describes the relationship between weight summation value and momentum term. The scenario assumes that the weight summation value is too small to contribute to learning process due to large quantity of momentum term in the early learning stage. By this scenario, they constructed the mechanism and necessary conditions of PS phenomenon.

Ng et al. [22], Oh [23], and Ooyen and Nienhuis [24] analyzed the causes of slow learning PS phenomenon by the gradient data of activation function. Oh [23] found that if the actual value of output nodes is under the ES condition, the learning term will be too small to improve the weights learning. By this reason, he proposed a modified energy function to overcome the drawbacks of the ES conditions. The proposed energy function will make the learning term reasonable regardless of the distribution of actual output value. Similarly, Ng et al. [22] modified the energy function to scale up the partial derivatives of the activation function and proposed a new weight evolution algorithm based on the modified energy function. Also, Ooyen and Nienhuis [24] improve the learning convergence by a new energy function based on the cross entropy (CE) algorithm. These methods could prevent the occurrence of PS phenomenon, but the used energy functions might be meaningless.

Section snippets

Error saturation prevention method

In this section, we will first describe what is the ES problem, and then analyze the causes of the ES condition and the influence related to the learning efficiency of BP algorithm. Finally, we propose an ESP function (e.g., parabolic function) to avoid the ES condition from output nodes. Then we apply the ESP method to the nodes in hidden layers with proper modifications.

Experimental results

In order to illustrate the workings of our proposed method, a modified XOR problem and two pattern classification problems with different complexity are used to verify our study in the ESP problem.

Discussion

Although the proposed ESP method to improve the learning efficiency of neural network model is efficient, there are some issues needed to be investigated:

(1) More applications by our proposed method. In this paper, we focus on the ES prevention of classification problems whose desired output is limited (or near) to either [0,1] or [−1,1]. However, many other problems [1] (e.g., function approximation) may suffer from the ESP condition, as well. As a result, we should extend our ESP method for

Conclusions

According to the previous analysis and simulations, we can draw the following advantages of applying our ESP method:

(1)	The ESP method is simple and intuitive to prevent the ES condition during the learning process.
(2)	The ESP method can speed up the learning efficiency in our performed problems.
(3)	Distance_Entropy can explain the phenomenon of accelerating learning and the ESP method could keep the semantic meaning of the used energy function.

Hahn-Ming Lee is currently a Professor in the department of Electronic Engineering at National Taiwan University of Science and Technology, Taipei, Taiwan. He received the B.S. degree and Ph.D. degree from the department of Computer Science and Information Engineering at National Taiwan University in 1984 and 1991, respectively. His research interests includes neural networks, fuzzy computing, intelligent systems on the web, and machine learning. He is a member of IEEE, CFSA, and IICM.

References (33)

N. Baba
A new approach for finding the global minimum of error function of neural networks
Neural Networks
(1989)
M.M. Gupta et al.
On the principle of fuzzy neural networks
Fuzzy Sets Syst.
(1994)
R.A. Jacobs
Increased rates of convergence through learning rate adaptation
Neural Networks
(1988)
Y. Lee et al.
An analysis of premature saturation in back-propagation learning
Neural Networks
(1993)
S. Abe et al.
Fuzzy rules extraction directly from numerical data for function approximation
IEEE Trans. Syst. Man Cybern. Part B: Cybernetics
(1996)
J. Alirezaie, M.E. Jernigan, C. Nahmias, Neural network based segmentation of magnetic resonance images of the brain,...
M. Arisawa, J. Watata, Enhanced back propagation learning and it's application to business evaluation, 1994 IEEE...
N. Baba, A hybrid algorithm for finding global minimum of error function of neural networks, Proceeding of the...
K.P. Bennett et al.
Robust linear programming discrimination of two linearly inseparable sets
Optimization Methods and Software
(1992)
K. Balakrishnan, V. Honavar, Improving convergence of back-propagation by handling flat-spots in the output layer,...

C. Charalambous

Conjugate gradient algorithm for efficient training of artificial neural networks

IEEE Proc. -G

(1992)

Y.L. Cun

Back-propagation applied to handwritten code recognition

Neural Comput.

(1989)

R. Fisher

The Use of Multiple Measurements in Taxonomic problems

Ann. Eugenics

(1936)

H. Ishibuchi et al.

Neural networks that learn from fuzzy if-then rules

IEEE Trans. Fuzzy Syst.

(1993)

A.K. Jain, J. Mao, K.M. Mohiuddin, Artificial neural networks: a tutorial, IEEE Comput. Magazine (1996)...

S. Kollias et al.

An adaptive least squares algorithm for the efficient training of artificial neural networks

IEEE Trans. Circuits Syst.

(1989)

Cited by (48)

Prediction of rock mass parameters in the TBM tunnel based on BP neural network integrated simulated annealing algorithm
2020, Tunnelling and Underground Space Technology
Citation Excerpt :
Simulated annealing is an optimization algorithm proposed by Metropolis et al. (1953). Based on the physical process of annealing, it is a meta-heuristic technique for solving complex optimization problems. Conventional BP neural networks suffer from several drawbacks, such as a low convergence rate, sensitivity to weight initialization, and a high probability of entrapment in local extrema (Chatterjee et al., 2009; Lee et al., 2001; Zain et al., 2011). The SA algorithm introduces randomness to the neural network, enabling it to effectively avoid the weights and bias of BPNN becoming entrapped in local extrema.
The prediction of rock mass parameters is of great significance in ensuring the safety and efficiency of tunnel boring machine (TBM) tunnel construction. Previous studies have confirmed the existence of a relationship between TBM driving parameters and rock mass parameters. In this work, we attempt to utilize the TBM driving parameters to predict rock mass parameters, including uniaxial compressive strength (UCS), brittleness index (Bi), distance between plane of weakness (DPW), and the orientation of discontinuities (α). We propose a hybrid algorithm (SA-BPNN) which integrates the back propagation neural network (BPNN) with simulated annealing (SA). A three-layer BPNN model was trained, using TBM driving and rock mass parameters from the Songhua River water conveyance project. We collected 320 samples, and randomly selected 280 of these to train the model, while the remaining 40 samples made up the first dataset to test the model. The predicted mean absolute percentage errors (MAPEs) of α, UCS, DPW, and Bi were 7.7%, 13.9%, 12.9%, and 11.0%, respectively, with the corresponding determination coefficient (R²) of 0.845, 0.737, 0.731, and 0.657, respectively. Another 40 samples with different lithology were collected to verify the model. Although the prediction results were not as good as those from the first dataset, they were still acceptable. The results reveal that the SA-BPNN model has a relatively high accuracy. To verify the optimization effect of the SA method on the BPNN algorithm, a BPNN model was established and tested. The results of the SA-BPNN model were more accurate than those of the BPNN model.
Deep quantum inspired neural network with application to aircraft fuel system fault diagnosis
2017, Neurocomputing
Fault diagnosis for aircraft fuel system can not only improve flight security, but also reduce the huge cost due to regular maintenance. It remains a problem because of the complicated system and the heterogeneous failure modes, especially the different failure modes that have similar impacts on the system. This paper uses the deep quantum inspired neural network (DQINN) which is an improved deep quantum network (DQN) to solve such problem. This method is the combination of classical deep belief network (DBN) and quantum inspired neural network (QINN). For the purpose of inheriting the advantages of DBN and QINN, the structure of DQINN is built in a new fashion. From a system perspective, the DQINN is constructed by the linear superposition of multiple DBNs with quantum intervals in the last hidden layer. Experiments conducted on standard datasets show that DQINN outperforms other three classical algorithms. Finally, a normal model of aircraft fuel system is built and four kinds of common failure modes of the core components are injected into this model, respectively. And the DQINN is applied to the aircraft fuel system fault diagnosis.
A novel robust regression model based on functional link least square (FLLS) and its application to modeling complex chemical processes
2016, Chemical Engineering Science
Citation Excerpt :
In addition, valuable developments on FLANN can be found in the papers of Patra et al. (2005, 2008a, 2008b, 2011) and Patra (2011). In FLANN, the well-known error back propagation (EBP) algorithm (Lee et al., 2001) is adopted to train the network. When FLANNs are used to develop soft sensor models, some parameters, like the learning rate and the learning epochs, are critical during the learning phase.
In this paper, a novel robust regression model is proposed. The proposed robust regression model is called functional link least square (FLLS). The idea of the proposed FLLS model arises from the functional link artificial neural network (FLANN). The FLANN model can be established by using the Error Back-propagation algorithm. However, the performance of the FLANN model is limited. Different from the FLANN model, the proposed FLLS model can achieve an optimal regression model by using the least square algorithm. The proposed FLLS model has some salient features: first, the algorithm of FLLS is extremely fast; secondly, the training errors of the FLLS model can be nearly minimized to be zero; third, the testing performance of FLLS model is robust. In order to evaluate the performance of the proposed regression model, case studies of modeling two complex chemical processes are provided. Two more models of the FLANN and the partial least square (PLSR) are also developed for comparisons. Results illustrated that the proposed FLLS regression model could significantly improve the testing performance.
Localization of tumor and its stage using intelligent techniques
2015, Applied Soft Computing Journal
Citation Excerpt :
In this paper, network topology is constructed by one hidden layer with 100 neurons. In this back propagation neural network [24] model, each neuron computes the weighted sum of its inputs and outputs according to whether this weighted input sum is above or below a certain threshold θk. The bias applied has the effect of lowering the net input of the activation function.
Breast cancer is the most common disease, which is the leading cause of cancer deaths among women. This deadly disease is curable, if identified at the initial stage. In this paper, we propose a scheme for predicting the real stage of breast cancer by retrieving the mammogram images from the past cases that are similar to the query image. First, Gabor and energy features are extracted to differentiate abnormal tissue textures from normal tissue texture. In the second step, back propagation neural network (BPNN) algorithm is used to detect the tumor. Next, pattern is extracted from the query image and the real stage of breast cancer is identified using the depth of the tumor. In this paper, Euclidean distance metric and Mahalanobis distance metric are used to compute the pattern similarity between the images for retrieval. In the same way, pattern base images also retrieved. The tumor found in these retrieved images shows the same stage of breast cancer related to the query image. Performance of Euclidean and Mahalanobis distance metric is compared using precision recall measures. The proposed approach achieved 87% classification rate.
Modeling of global horizontal irradiance in the United Arab Emirates with artificial neural networks
2014, Energy
Citation Excerpt :
MLP is a feed-forward network which is typically trained with back propagation. The back-propagation network [27] propagates the errors from the output layers backward to the hidden layer, distributes them among neurons according to their error contribution. Basically, it has two steps: forward (function signals) calculation to produce a solution and based on the error, backward propagation (error signals) to adjust weights [25].
This paper employs ANN (Artificial Neural Network) models to estimate GHI (global horizontal irradiance) for three major cities in the UAE (United Arab Emirates), namely Abu Dhabi, Dubai and Al-Ain. City data are then used to develop a comprehensive global GHI model for other nearby locations in the UAE. The ANN models use MLP (Multi-Layer Perceptron) and RBF (Radial Basis Function) techniques with comprehensive training algorithms, architectures, and different combinations of inputs. The UAE models are tested and validated against individual city models and data available from the UAE Solar Atlas with good agreement as attested by the computed statistical error parameters.
The optimal ANN model is MLP-based and requires four mean daily weather parameters; namely, maximum temperature, wind speed, sunshine hours, and relative humidity. The computed statistical error parameters for the optimal MLP-ANN model in relation to the measured three-cities mean data (referred to as UAE data) are MBE (mean bias error) = −0.0003 kWh/m², RMSE = 0.179 kWh/m², R² = 99%, NSE (Nash-Sutcliffe model Efficiency coefficient) = 99%, and t-statistic = 0.005 at 5% significance level. Results prove the suitability of the ANN models for estimating the monthly mean daily GHI in different locations of the UAE.
Attribute selection method based on a hybrid BPNN and PSO algorithms
2012, Applied Soft Computing Journal
High dimensional data contain many redundant or irrelevant attributes, which will be difficult for data mining and a variety of pattern recognition. When implementing data mining or a variety of pattern recognition on high dimensional space, it is necessary to reduce the dimension of high dimensional space. In this paper, a new attribute importance measure and selection methods based on attribute ranking was proposed. In proposed attribute selection method, input output correlation (IOC) is applied for calculating attribute’ importance, and then sorts them according to descending order. The hybrid of Back Propagation Neural Network (BPNN) and Particle Swarm Optimization (PSO) algorithms is also proposed. PSO is used to optimize weights and thresholds of BPNN for overcoming the inherent shortcoming of BPNN. The experiment results show the proposed attribute selection method is an effective preproceesing technology.

View all citing articles on Scopus

Chih-Ming Chen was born in Nantou Taiwan in 1969. He received B.S. and M.S. degree from the Department of Industrial Education at National Taiwan Normal University in 1992 and 1997, respectively. Presently he is a Ph.D. candidate of the Institute of Electronic Engineering at National Taiwan University of Science and Technology. His research interests include neural networks, cerebellar model arithmetic computer, fuzzy sets theory, grey theory, and intelligent agents on the web.

Tzong-Ching Huang was born in Chia-Yi Taiwan in 1971. He received B.S. degree from the Department of Computer Science and Information Engineering at National Chao Tong University in 1994, and he received M.S. degree from the Department of Electronic Engineering at National Taiwan University of Science and Technology. Presently he is a software engineer in VIA Technology Corporation (w3.via.com.tw). His research interests include neural networks and fuzzy sets theory.

View full text

Learning efficiency improvement of back-propagation algorithm by error saturation prevention method

Abstract

Introduction

Section snippets

Error saturation prevention method

Experimental results

Discussion

Conclusions

Neural Networks

Fuzzy Sets Syst.

Neural Networks

Neural Networks

Fuzzy rules extraction directly from numerical data for function approximation

IEEE Trans. Syst. Man Cybern. Part B: Cybernetics

Robust linear programming discrimination of two linearly inseparable sets

Optimization Methods and Software

Conjugate gradient algorithm for efficient training of artificial neural networks

IEEE Proc. -G

Back-propagation applied to handwritten code recognition

Neural Comput.

The Use of Multiple Measurements in Taxonomic problems

Ann. Eugenics

Neural networks that learn from fuzzy if-then rules

IEEE Trans. Fuzzy Syst.

An adaptive least squares algorithm for the efficient training of artificial neural networks

IEEE Trans. Circuits Syst.