Elsevier

Neurocomputing

Volume 41, Issues 1–4, October 2001, Pages 125-143
Neurocomputing

Learning efficiency improvement of back-propagation algorithm by error saturation prevention method

https://doi.org/10.1016/S0925-2312(00)00352-0Get rights and content

Abstract

Back-propagation (BP) algorithm is currently the most widely used learning algorithm in artificial neural networks. With proper selection of feed-forward neural network architecture, it is capable of approximating most problems with high accuracy and generalization ability. However, the slow convergence is a serious problem when using this well-known BP learning algorithm in many applications. As a result, many researchers take effort to improve the learning efficiency of BP algorithm by various enhancements. In this research, we consider that the error saturation (ES) condition, which is caused by the use of gradient descent method, will greatly slow down the learning speed of BP algorithm. Thus, in this paper, we will analyze the causes of the ES condition in the output layer. An error saturation prevention (ESP) function is then proposed to prevent the nodes in the output layer from the ES condition. We also apply this method to the nodes in hidden layers to adjust the learning terms. By the proposed methods, we can not only improve the learning efficiency by the ES condition prevention but also maintain the semantic meaning of the energy function. Finally, some simulations are given to show the workings of our proposed method.

Introduction

The back-propagation (BP) algorithm [25] is a widely used learning algorithm in artificial neural networks [11], [12], [14], [18]. It works well for many problems (e.g., classification or pattern recognition, etc.) [2], [3], [9], [25]. However, it suffers two critical drawbacks from the use of gradient-descent method: one is the learning process often traps into local minimum and another is its slow learning speed. As a result, there are many researches on them [4], [5], [15], especially for the learning efficiency improvement by preventing the premature saturation (PS) phenomenon [16], [30]. The PS means that outputs of the artificial neural networks temporarily trap into high error level during the early learning stage. In the learning issues of PS phenomenon, researchers have designed many usable modifications to solve this phenomenon [7], [17], [31]. However, the proposed methods are limited to many assumptions and seem to be complex (will be detailed in the next subsection). In this study, the error saturation (ES) condition [17] is considered as the main cause of PS phenomenon. The ES condition means that the nodes in each layer of artificial neural network models have outputs near the extreme value 0 or 1, but with obvious differences between the desired and actual outputs. Consequently, the learning term (signal) will be very small for the increment of weights or others parameters [23]. Therefore, we will use the error saturation prevention (ESP) method to overcome the PS phenomenon and thus the learning convergence will speed up and the local minimum problem will be relieved. Besides, we keep the semantic meaning of used mean-square-error (MSE) function to rationalize the evaluation of error criterion.

The slow learning speed of conventional BP algorithm for training feed-forward multi-layer neural network models is due to the fact that back propagation is a gradient descent method [25]. Many researchers have taken efforts on the efficiency improvement issue of BP algorithm and we will summarize them as below.

In the research of Lee et al. [16], [17], they analyzed the slow learning convergence caused by the PS phenomenon. Under this situation, the learning term will be too small to lead the training weights adaptation [17], [30], [31]. In their presentation, they found that the PS phenomenon during early epochs of learning procedure is caused by inappropriate setting of initial weights. Also, the probability of PS can be derived by the maximum value of initial weights, the number of nodes in each layer, and the maximum slope of the sigmoid activation function. After deriving the PS probability function, they avoid the PS phenomenon by properly setting initial weights. However, they derived the PS probability equation based on the assumption that the initial weights and thresholds both are in uniform distribution. Besides, some limitations must be set during learning. In real-world applications, however, these uniform distribution requirement and limitations of learning parameters may not be easily achieved.

Vetela and Reifman [30], [31] analyzed the causes of PS by dividing the learning process into three stages: beginning of saturation stage, saturation plateau stage and complete recovery stage. They proposed four necessary conditions for the occurrence of PS phenomenon by gradient of weights, the momentum terms, and extreme error condition. After constructing the PS mechanism and necessary conditions, they prevent anyone of the four conditions to be satisfied and thus avoid the slow learning condition. Actually, they think that the momentum term plays the leading role in the occurrence of PS phenomenon. Therefore, they improve the convergence speed by properly setting momentum term in learning process. This method is efficient to prevent the PS phenomenon in the early learning stage. However, they also suffer from some limitations. For constructing the mechanism of PS phenomenon, they must design a scenario which describes the relationship between weight summation value and momentum term. The scenario assumes that the weight summation value is too small to contribute to learning process due to large quantity of momentum term in the early learning stage. By this scenario, they constructed the mechanism and necessary conditions of PS phenomenon.

Ng et al. [22], Oh [23], and Ooyen and Nienhuis [24] analyzed the causes of slow learning PS phenomenon by the gradient data of activation function. Oh [23] found that if the actual value of output nodes is under the ES condition, the learning term will be too small to improve the weights learning. By this reason, he proposed a modified energy function to overcome the drawbacks of the ES conditions. The proposed energy function will make the learning term reasonable regardless of the distribution of actual output value. Similarly, Ng et al. [22] modified the energy function to scale up the partial derivatives of the activation function and proposed a new weight evolution algorithm based on the modified energy function. Also, Ooyen and Nienhuis [24] improve the learning convergence by a new energy function based on the cross entropy (CE) algorithm. These methods could prevent the occurrence of PS phenomenon, but the used energy functions might be meaningless.

Section snippets

Error saturation prevention method

In this section, we will first describe what is the ES problem, and then analyze the causes of the ES condition and the influence related to the learning efficiency of BP algorithm. Finally, we propose an ESP function (e.g., parabolic function) to avoid the ES condition from output nodes. Then we apply the ESP method to the nodes in hidden layers with proper modifications.

Experimental results

In order to illustrate the workings of our proposed method, a modified XOR problem and two pattern classification problems with different complexity are used to verify our study in the ESP problem.

Discussion

Although the proposed ESP method to improve the learning efficiency of neural network model is efficient, there are some issues needed to be investigated:

(1) More applications by our proposed method. In this paper, we focus on the ES prevention of classification problems whose desired output is limited (or near) to either [0,1] or [−1,1]. However, many other problems [1] (e.g., function approximation) may suffer from the ESP condition, as well. As a result, we should extend our ESP method for

Conclusions

According to the previous analysis and simulations, we can draw the following advantages of applying our ESP method:

(1)The ESP method is simple and intuitive to prevent the ES condition during the learning process.
(2)The ESP method can speed up the learning efficiency in our performed problems.
(3)Distance_Entropy can explain the phenomenon of accelerating learning and the ESP method could keep the semantic meaning of the used energy function.

Hahn-Ming Lee is currently a Professor in the department of Electronic Engineering at National Taiwan University of Science and Technology, Taipei, Taiwan. He received the B.S. degree and Ph.D. degree from the department of Computer Science and Information Engineering at National Taiwan University in 1984 and 1991, respectively. His research interests includes neural networks, fuzzy computing, intelligent systems on the web, and machine learning. He is a member of IEEE, CFSA, and IICM.

References (33)

  • C. Charalambous

    Conjugate gradient algorithm for efficient training of artificial neural networks

    IEEE Proc. -G

    (1992)
  • Y.L. Cun

    Back-propagation applied to handwritten code recognition

    Neural Comput.

    (1989)
  • R. Fisher

    The Use of Multiple Measurements in Taxonomic problems

    Ann. Eugenics

    (1936)
  • H. Ishibuchi et al.

    Neural networks that learn from fuzzy if-then rules

    IEEE Trans. Fuzzy Syst.

    (1993)
  • A.K. Jain, J. Mao, K.M. Mohiuddin, Artificial neural networks: a tutorial, IEEE Comput. Magazine (1996)...
  • S. Kollias et al.

    An adaptive least squares algorithm for the efficient training of artificial neural networks

    IEEE Trans. Circuits Syst.

    (1989)
  • Cited by (48)

    • Prediction of rock mass parameters in the TBM tunnel based on BP neural network integrated simulated annealing algorithm

      2020, Tunnelling and Underground Space Technology
      Citation Excerpt :

      Simulated annealing is an optimization algorithm proposed by Metropolis et al. (1953). Based on the physical process of annealing, it is a meta-heuristic technique for solving complex optimization problems. Conventional BP neural networks suffer from several drawbacks, such as a low convergence rate, sensitivity to weight initialization, and a high probability of entrapment in local extrema (Chatterjee et al., 2009; Lee et al., 2001; Zain et al., 2011). The SA algorithm introduces randomness to the neural network, enabling it to effectively avoid the weights and bias of BPNN becoming entrapped in local extrema.

    • A novel robust regression model based on functional link least square (FLLS) and its application to modeling complex chemical processes

      2016, Chemical Engineering Science
      Citation Excerpt :

      In addition, valuable developments on FLANN can be found in the papers of Patra et al. (2005, 2008a, 2008b, 2011) and Patra (2011). In FLANN, the well-known error back propagation (EBP) algorithm (Lee et al., 2001) is adopted to train the network. When FLANNs are used to develop soft sensor models, some parameters, like the learning rate and the learning epochs, are critical during the learning phase.

    • Localization of tumor and its stage using intelligent techniques

      2015, Applied Soft Computing Journal
      Citation Excerpt :

      In this paper, network topology is constructed by one hidden layer with 100 neurons. In this back propagation neural network [24] model, each neuron computes the weighted sum of its inputs and outputs according to whether this weighted input sum is above or below a certain threshold θk. The bias applied has the effect of lowering the net input of the activation function.

    • Modeling of global horizontal irradiance in the United Arab Emirates with artificial neural networks

      2014, Energy
      Citation Excerpt :

      MLP is a feed-forward network which is typically trained with back propagation. The back-propagation network [27] propagates the errors from the output layers backward to the hidden layer, distributes them among neurons according to their error contribution. Basically, it has two steps: forward (function signals) calculation to produce a solution and based on the error, backward propagation (error signals) to adjust weights [25].

    View all citing articles on Scopus

    Hahn-Ming Lee is currently a Professor in the department of Electronic Engineering at National Taiwan University of Science and Technology, Taipei, Taiwan. He received the B.S. degree and Ph.D. degree from the department of Computer Science and Information Engineering at National Taiwan University in 1984 and 1991, respectively. His research interests includes neural networks, fuzzy computing, intelligent systems on the web, and machine learning. He is a member of IEEE, CFSA, and IICM.

    Chih-Ming Chen was born in Nantou Taiwan in 1969. He received B.S. and M.S. degree from the Department of Industrial Education at National Taiwan Normal University in 1992 and 1997, respectively. Presently he is a Ph.D. candidate of the Institute of Electronic Engineering at National Taiwan University of Science and Technology. His research interests include neural networks, cerebellar model arithmetic computer, fuzzy sets theory, grey theory, and intelligent agents on the web.

    Tzong-Ching Huang was born in Chia-Yi Taiwan in 1971. He received B.S. degree from the Department of Computer Science and Information Engineering at National Chao Tong University in 1994, and he received M.S. degree from the Department of Electronic Engineering at National Taiwan University of Science and Technology. Presently he is a software engineer in VIA Technology Corporation (w3.via.com.tw). His research interests include neural networks and fuzzy sets theory.

    View full text