Gaussian-PSO with fuzzy reasoning based on structural learning for training a Neural Network
Introduction
The Artificial Neural Network (ANN) is widely used in various applications such as data processing, classification, regression analysis, time series prediction and pattern recognition. The ANN is a statistical learning method inspired by the learning capability of the brain׳s neuronal system. Compromising a three-layer structure input, hidden and output layers the ANN is normally used to approximate non-linear functions. Back Propagation (BP) is widely used in learning algorithms for the Neural Network (NN) [1]. The BP is a supervised learning method that uses a gradient descent method to minimize the error between an actual output and a target output. Generally BP learning is good, but it depends on the design of the network. The number of inputs (based on data features), hidden units (empirically determined) and output units (function output) is fixed. After the weights are randomly selected the output value error is minimized as the learning is performed to adjust the weights. Other error minimization learning techniques have been developed for the BP such as gradient descendant [2], resilient, BGFS quasi-Newton, one-step secant, Levenberg–Marquad [3] and Bayesian regularization. These learning techniques have disadvantages such as slow convergence and they easily get trapped in local minima as a result of poor network structure specification and the necessity of tuning its parameters [4]. Even though the fixed structure is a small network, it takes some time to reach the optimal weights value during the training. In a larger network the computational time is exponentially longer as it is affected by the network size since the number of redundant connections is larger [5], [6]. The BP performance depends on the parameters settings (momentum and learning rate) as well as the network structure. Therefore, its ability to process information is limited. Many researchers focused on improving the energy function used on the learning process [7], [8] and BP parameter setting [9], [10]. However, the BP algorithm convergence speed is slowed as the network structure gets increasingly complex and the obstacle of getting trapped in the local optima is not mitigated. Additionally, prior information of the network structure is normally absent, and the setting depends not only on the size and the characteristics of the training data but also on the empirical knowledge of the designer [2], [3].
Recently, based on the evolutionary algorithms (EA) advantages, new optimization methods based on problems such as machine learning alternatives have been proposed. The Genetic Algorithm (GA) is one of the most used EAs for optimization problems and it has also been proposed as a training technique for the NN [11], [12]. The GA is a meta-heuristic algorithm with the principal evolutionary operators, selection, crossover and mutation, that are inspired in the evolution. The network structure design can be formulated as an optimization problem where the GA searches for the optimal solution [13], [14]. Although it is necessary to encode and decode the solution, this influences the search of the optimal solution. If the network structure is small, then GA has better results than the BP. If the network structure is more complex then the GA performance accuracy is also poor. The larger the training data is, the slower the GA convergence speed becomes. To ameliorate these disadvantages, new approaches were proposed. In the ANNA ELEANOR [15], a new encoding method called granularity was introduced to decide the optimal length. It included a GA-simplex that was used to modify three individuals of the population to have better exploration of the landscape fitness. GA-simplex had good performance despite early convergence even if applied multiple times, it converges early. In GNARL [16], the mutation operator is used for structural and weight learning. The value of each hidden node and weight is selected randomly from a predetermined range. In this algorithm half of the population is defined as parent and the new population is generated with a two-mutation operator: the paramedic mutation (Gaussian noise) and the structural mutation (deletion or addition of units or weights). The algorithm uses a landscape fitness function but there is no improvement on the evolutionary process. In the EP-NET [6], several methods (modified back-propagation, simulating annealing and adaptive learning rules) are adopted so that the offspring resemble the parents. The algorithm shows the importance of using population information but it is necessary to combine several non-linear methods to improve results. Since the PSO [17] is a population-based algorithm that imitates swarm behavior for cooperative learning it has the advantage of producing better results in the global search and faster convergence. Therefore, the PSO algorithm has been proposed as a training algorithm in the NN [18]. A PSO algorithm was used in a NN for forecasting pollution levels where good results, faster convergence and better performance were obtained. Later a hybrid algorithm with a PSO for enhanced GA performance was proposed with good results [19]. The related works indicated that GA or PSO as a training algorithm has good results depending on the size of the network [20], [21]. Comparing the GA and the PSO, the PSO was demonstrated to have better performance on smaller network structures. Contrarily the PSO gets trapped in local minima. To avoid this the use of Gaussian Random variables instead of normal distributed variables is proposed to improve the convergence and incorporate learning into the algorithm [22] since the size of the particle influences the computational time and the convergence of the optimal solution. The structural learning methods [23] improve the computational time by modifying the network structure and the number of units in the hidden layer. Therefore, in this paper a GPSO for training of the NN combined with structure learning and fuzzy reasoning to identify the optimal network structure is proposed. The use of Gaussian random variables in the PSO improves search performance and information sharing capability among the particles providing more stability during the search for an optimal solution. Then a fuzzy reasoning based on structural learning is proposed to reduce the computational time and find a more compact network. In contrast with other methods that use only PSO to find the optimal values for the weights in the network, the GPSO avoids parameters׳ tuning problems, while the learning of the search space with a Gaussian probability distribution enables escape from the local minima. Thus, it is time consuming to select the proper parameters in the PSO considering that it is conducted by trial and error. The proposed encoding strategy of the weights allows a graphical representation of the network. Moreover, the fuzzy reasoning based on structural learning supports a more compact network by finding the weak weights and the optimal number of units for the network. This is determined by the “goodness” of the weight with the goodness factor and the use of a membership function that avoids the deletion in early stages of the algorithm. Therefore the most important weights and units in the structure of the NN are identified. The proposed algorithm is applied to train a three-layer neural network. To test the proposed algorithm an iris data set classification problem is used. This paper is organized as follows: In Section 1 an introduction to PSO is given. Section 2 explains the use of the Gaussian random variables and the algorithm used for sampling the variables. In 3 Feed forward neural network, 4 Solution encoding and PSO parameters the basic structure of the neural network is explained as well as the weight strategy and the fitness factor for the GPSO. The structure-learning algorithm and the fuzzy reasoning [24] are presented in Section 5. To test the performance of the proposed algorithm a numerical example is provided in Section 6. Finally Section 7 concludes this paper.
Section snippets
Gaussian Particle Swarm Optimization
The PSO is a meta-heuristic algorithm that was proposed by Eberhart and Kennedy in 1995 [25], [26]. The PSO imitates the behavior of birds in a flock by mimicking the collective learning of the group. The birds, represented by particles, share the information among themselves to reach the optimal solution. The particles are initialized in a uniform random search space N. The velocities and positions of all the particles are updated iteratively, based on the best position and the neighbors׳
Feed forward neural network
The NN is a mathematical model that is widely used for various applications such as forecasting, classification and pattern recognition. The NN uses a structure that emulates the connections of the neurons in the brain. The training algorithms in the NN play a pivotal role in the learning process and they are applied to improve the learning in many applications. The NN structure is mainly composed of three different layers: input layer, hidden layer and output layer as shown in Fig. 5. These
Encoding strategy
The most used encoding strategy for the PSO is the vector arrangement. This strategy cannot be used to represent the distribution structure of the NN. Therefore, a matrix representation is used for easier understanding and decoding of the solution. For example, a 2–3–2 network structure is shown in Fig. 6. Normally it would be decoded in two different matrices: the weights from the input layer to the hidden layer and the weights from the hidden layer to the output layer. The first matrix in the
Structural learning
Structural learning is a method to reduce the computational time and ameliorate the difficulty of specifying the network structure. Its objective is the deletion of the hidden units with no activity or with almost no contribution to the Mean Square Error (MSE). Even though a GPSO is used for training the network, it has the same problem when the network is too big as the computational time is increased due to time taken to decode the solution. Therefore a fuzzy reasoning with structural
Numerical examples
To test the proposed training algorithm, the Iris data classification problem is used. The performance is compared to the BP algorithm that is a widely used algorithm for training the NN. The initial values of the parameters are as follows: the weights are generated in the range , and the biases were set to 0. The inertial weight q for the GPSO was set to 1.8, and initial population to 25 particles.
Conclusion
In this paper a GPSO with structure learning and fuzzy reasoning for training a NN was proposed. The PSO is introduced as a training algorithm for the NN with the objective of using its advantages such as faster convergence. Gaussian random variables are used rather than normally distributed variables to enhance the PSOs׳ disadvantage in the local search. Furthermore, the structural learning with fuzzy reasoning was introduced to improve the computational time by modifying the network
Haydee Melo received the B.Sc. degree in Electro-mechanics engineering from National Polytechnic Institute (IPN), Mexico and the M.Sc. degree from the Graduate School of Information, Production and Systems, Waseda University, Japan. She is currently a Doctor student in Graduate School of Information, Production and Systems, Waseda University, Japan as a scholarship recipient student from the Ministry of Education, Culture, Sports, Science and Technology, Japan. Her research interests include
References (32)
- et al.
Improving the convergence of the back-propagation algorithm
Neural Netw.
(1992) Increased rates of convergence through learning rate adaptation
Neural Netw.
(1988)- et al.
Application of evolutionary neural network method in predicting pollutant levels in downtown area of Hong Kong
Neurocomputing
(2003) A hybrid particle swarm optimization-back propagation algorithm for feed forward neural network training
Appl. Math. Comput.
(2007)- M.F. Moller, A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning PB-339 Reprint, Computer science...
- C.H. Chen, H. Lai, An empirical study of the Gradient descent and the conjugate gradient backpropagation neural...
- M.T. Hagan, M.B. Menhaj, Training feed-forward networks with the Marquadt algorithm, IEEE International Conference on...
- et al.
On the problem of local minima in back-propagation algorithm
Neural Netw.
(1992) - et al.
A robust evolutionary algorithm for training neural networks
Neural Comput. Appl.
(2001) - et al.
A new evolutionary system for evolving artificial neural networks
IEEE Trans. Neural Netw.
(1997)
A method for self-determination of adaptive learning rates in back propagation
Neural Netw.
Cited by (67)
A strategy learning framework for particle swarm optimization algorithm
2023, Information SciencesPredicting freshwater production in seawater greenhouses using hybrid artificial neural network models
2021, Journal of Cleaner ProductionCitation Excerpt :Some weight connections in the ANN models may have low values. These weights cause the computational time of ANN models to increase significantly (Melo and Watada, 2016). The ALO, based on advanced operators, searches for the optimal values of decision variables in the different systems.
Directed particle swarm optimization with Gaussian-process-based function forecasting
2021, European Journal of Operational ResearchParticle swarm optimization with state-based adaptive velocity limit strategy
2021, NeurocomputingCitation Excerpt :The algorithms related with this strategy include the orthogonal learning PSO (OLPSO) proposed by Zhan et al. [23], orthogonal learning brain storm optimization (OLBSO) proposed by Ma et al. [24] and orthogonal local search genetic algorithm (OLSGA) proposed by Huang et al. [25]. In the third type, apart from the original steps in PSO, extra auxiliary strategies such as limit handling strategy [26,27], restart strategy [28] and Gaussian mutation strategies [29] have been put forward. For example, cooperative coevolutionary bare-bones PSO (CCBBPSO) proposed by Zhang et al. [30] utilizes decomposition methods to reduce the complexity of optimization problems.
Improving multi-objective algorithms performance by emulating behaviors from the human social analogue in candidate solutions
2021, European Journal of Operational ResearchCitation Excerpt :We also wanted to test how the proposed mutation operator (MOEDB) performs in comparison to others, than the PLM, mutation operators. Below, we provide an experimental comparison of the proposed mutation operator (MOEDB), against four alternative mutation operators, namely: (a) Uniform mutation (Michalewicz, 1994), (b) Gaussian mutation (Melo & Watada, 2016), (c) Cauchy mutation (Gong et al., 2010) and (d) Non-Uniform mutation (Chen et al., 2008). In Table 12 we summarize the experimental results (i.e. mean values and standard deviations over 30 runs for hypervolume (HV), inverted generalization distance (IGD) and epsilon indicator respectively) obtained by two (2) different tests.
Haydee Melo received the B.Sc. degree in Electro-mechanics engineering from National Polytechnic Institute (IPN), Mexico and the M.Sc. degree from the Graduate School of Information, Production and Systems, Waseda University, Japan. She is currently a Doctor student in Graduate School of Information, Production and Systems, Waseda University, Japan as a scholarship recipient student from the Ministry of Education, Culture, Sports, Science and Technology, Japan. Her research interests include soft computing, management engineering, reliability and metaheuristic algorithms.
Junzo Watada received his B.S. and M.S. degrees in Electrical Engineering from Osaka City University Japan, and doctorate of Engineering on “Fuzzy Multivariant Analysis and its Applications” from Osaka Prefecture University, Japan. His professional interests include soft computing, tracking system, knowledge engineering, and management engineering. He is a recipient of the Henri Coanda Gold Medal from Invention in Romania, in 2002. He is a Life Fellow of Japan Society for Fuzzy Theory and intelligent Informatics (SOFT) and Biomedical Fuzzy System Association (BMFSA). He is the Principal Editor, Co-Editor, or Associate Editor of various international journals, including the International Journal of Biomedical Soft Computing and Human Sciences, ICIC Express Letters, the International Journal of Systems and Control Engineering, and Fuzzy Optimization and Decision Making.