Gaussian-PSO with fuzzy reasoning based on structural learning for training a Neural Network

doi:10.1016/j.neucom.2015.03.104

Neurocomputing

Volume 172, 8 January 2016, Pages 405-412

https://doi.org/10.1016/j.neucom.2015.03.104 Get rights and content

Abstract

This paper proposes Gaussian-PSO-based structural learning and fuzzy reasoning to optimize the weights and the structure of the Feed Forward Neural Network. The Neural Network is widely used for various applications; though it still has disadvantages such as learning capability and slow convergence. Back Propagation, the most used learning algorithm, has several difficulties such as the necessity for a priori specification of the network structure and sensibility to parameter settings. Recently, research studies have introduced evolutionary algorithms into the learning to improve its performance. The PSO is a population-based algorithm that has the advantage of faster convergence. However, the total number of the weights in the Neural Network determines the size of each particle, therefore the size of the network structure is computationally time consuming. The proposed method improves the learning and removes the stress by eliminating the necessity of determining a detailed network.

Introduction

The Artificial Neural Network (ANN) is widely used in various applications such as data processing, classification, regression analysis, time series prediction and pattern recognition. The ANN is a statistical learning method inspired by the learning capability of the brain׳s neuronal system. Compromising a three-layer structure input, hidden and output layers the ANN is normally used to approximate non-linear functions. Back Propagation (BP) is widely used in learning algorithms for the Neural Network (NN) [1]. The BP is a supervised learning method that uses a gradient descent method to minimize the error between an actual output and a target output. Generally BP learning is good, but it depends on the design of the network. The number of inputs (based on data features), hidden units (empirically determined) and output units (function output) is fixed. After the weights are randomly selected the output value error is minimized as the learning is performed to adjust the weights. Other error minimization learning techniques have been developed for the BP such as gradient descendant [2], resilient, BGFS quasi-Newton, one-step secant, Levenberg–Marquad [3] and Bayesian regularization. These learning techniques have disadvantages such as slow convergence and they easily get trapped in local minima as a result of poor network structure specification and the necessity of tuning its parameters [4]. Even though the fixed structure is a small network, it takes some time to reach the optimal weights value during the training. In a larger network the computational time is exponentially longer as it is affected by the network size since the number of redundant connections is larger [5], [6]. The BP performance depends on the parameters settings (momentum and learning rate) as well as the network structure. Therefore, its ability to process information is limited. Many researchers focused on improving the energy function used on the learning process [7], [8] and BP parameter setting [9], [10]. However, the BP algorithm convergence speed is slowed as the network structure gets increasingly complex and the obstacle of getting trapped in the local optima is not mitigated. Additionally, prior information of the network structure is normally absent, and the setting depends not only on the size and the characteristics of the training data but also on the empirical knowledge of the designer [2], [3].

Recently, based on the evolutionary algorithms (EA) advantages, new optimization methods based on problems such as machine learning alternatives have been proposed. The Genetic Algorithm (GA) is one of the most used EAs for optimization problems and it has also been proposed as a training technique for the NN [11], [12]. The GA is a meta-heuristic algorithm with the principal evolutionary operators, selection, crossover and mutation, that are inspired in the evolution. The network structure design can be formulated as an optimization problem where the GA searches for the optimal solution [13], [14]. Although it is necessary to encode and decode the solution, this influences the search of the optimal solution. If the network structure is small, then GA has better results than the BP. If the network structure is more complex then the GA performance accuracy is also poor. The larger the training data is, the slower the GA convergence speed becomes. To ameliorate these disadvantages, new approaches were proposed. In the ANNA ELEANOR [15], a new encoding method called granularity was introduced to decide the optimal length. It included a GA-simplex that was used to modify three individuals of the population to have better exploration of the landscape fitness. GA-simplex had good performance despite early convergence even if applied multiple times, it converges early. In GNARL [16], the mutation operator is used for structural and weight learning. The value of each hidden node and weight is selected randomly from a predetermined range. In this algorithm half of the population is defined as parent and the new population is generated with a two-mutation operator: the paramedic mutation (Gaussian noise) and the structural mutation (deletion or addition of units or weights). The algorithm uses a landscape fitness function but there is no improvement on the evolutionary process. In the EP-NET [6], several methods (modified back-propagation, simulating annealing and adaptive learning rules) are adopted so that the offspring resemble the parents. The algorithm shows the importance of using population information but it is necessary to combine several non-linear methods to improve results. Since the PSO [17] is a population-based algorithm that imitates swarm behavior for cooperative learning it has the advantage of producing better results in the global search and faster convergence. Therefore, the PSO algorithm has been proposed as a training algorithm in the NN [18]. A PSO algorithm was used in a NN for forecasting pollution levels where good results, faster convergence and better performance were obtained. Later a hybrid algorithm with a PSO for enhanced GA performance was proposed with good results [19]. The related works indicated that GA or PSO as a training algorithm has good results depending on the size of the network [20], [21]. Comparing the GA and the PSO, the PSO was demonstrated to have better performance on smaller network structures. Contrarily the PSO gets trapped in local minima. To avoid this the use of Gaussian Random variables instead of normal distributed variables is proposed to improve the convergence and incorporate learning into the algorithm [22] since the size of the particle influences the computational time and the convergence of the optimal solution. The structural learning methods [23] improve the computational time by modifying the network structure and the number of units in the hidden layer. Therefore, in this paper a GPSO for training of the NN combined with structure learning and fuzzy reasoning to identify the optimal network structure is proposed. The use of Gaussian random variables in the PSO improves search performance and information sharing capability among the particles providing more stability during the search for an optimal solution. Then a fuzzy reasoning based on structural learning is proposed to reduce the computational time and find a more compact network. In contrast with other methods that use only PSO to find the optimal values for the weights in the network, the GPSO avoids parameters׳ tuning problems, while the learning of the search space with a Gaussian probability distribution enables escape from the local minima. Thus, it is time consuming to select the proper parameters in the PSO considering that it is conducted by trial and error. The proposed encoding strategy of the weights allows a graphical representation of the network. Moreover, the fuzzy reasoning based on structural learning supports a more compact network by finding the weak weights and the optimal number of units for the network. This is determined by the “goodness” of the weight with the goodness factor and the use of a membership function that avoids the deletion in early stages of the algorithm. Therefore the most important weights and units in the structure of the NN are identified. The proposed algorithm is applied to train a three-layer neural network. To test the proposed algorithm an iris data set classification problem is used. This paper is organized as follows: In Section 1 an introduction to PSO is given. Section 2 explains the use of the Gaussian random variables and the algorithm used for sampling the variables. In 3 Feed forward neural network, 4 Solution encoding and PSO parameters the basic structure of the neural network is explained as well as the weight strategy and the fitness factor for the GPSO. The structure-learning algorithm and the fuzzy reasoning [24] are presented in Section 5. To test the performance of the proposed algorithm a numerical example is provided in Section 6. Finally Section 7 concludes this paper.

Section snippets

Gaussian Particle Swarm Optimization

The PSO is a meta-heuristic algorithm that was proposed by Eberhart and Kennedy in 1995 [25], [26]. The PSO imitates the behavior of birds in a flock by mimicking the collective learning of the group. The birds, represented by particles, share the information among themselves to reach the optimal solution. The particles are initialized in a uniform random search space N. The velocities and positions of all the particles are updated iteratively, based on the best position $(Bp)$ and the neighbors׳

Feed forward neural network

The NN is a mathematical model that is widely used for various applications such as forecasting, classification and pattern recognition. The NN uses a structure that emulates the connections of the neurons in the brain. The training algorithms in the NN play a pivotal role in the learning process and they are applied to improve the learning in many applications. The NN structure is mainly composed of three different layers: input layer, hidden layer and output layer as shown in Fig. 5. These

Encoding strategy

The most used encoding strategy for the PSO is the vector arrangement. This strategy cannot be used to represent the distribution structure of the NN. Therefore, a matrix representation is used for easier understanding and decoding of the solution. For example, a 2–3–2 network structure is shown in Fig. 6. Normally it would be decoded in two different matrices: the weights from the input layer to the hidden layer and the weights from the hidden layer to the output layer. The first matrix in the

Structural learning

Structural learning is a method to reduce the computational time and ameliorate the difficulty of specifying the network structure. Its objective is the deletion of the hidden units with no activity or with almost no contribution to the Mean Square Error (MSE). Even though a GPSO is used for training the network, it has the same problem when the network is too big as the computational time is increased due to time taken to decode the solution. Therefore a fuzzy reasoning with structural

Numerical examples

To test the proposed training algorithm, the Iris data classification problem is used. The performance is compared to the BP algorithm that is a widely used algorithm for training the NN. The initial values of the parameters are as follows: the weights are generated in the range $[- 15, 15]$ , and the biases were set to 0. The inertial weight q for the GPSO was set to 1.8, and initial population to 25 particles.

Conclusion

In this paper a GPSO with structure learning and fuzzy reasoning for training a NN was proposed. The PSO is introduced as a training algorithm for the NN with the objective of using its advantages such as faster convergence. Gaussian random variables are used rather than normally distributed variables to enhance the PSOs׳ disadvantage in the local search. Furthermore, the structural learning with fuzzy reasoning was introduced to improve the computational time by modifying the network

Haydee Melo received the B.Sc. degree in Electro-mechanics engineering from National Polytechnic Institute (IPN), Mexico and the M.Sc. degree from the Graduate School of Information, Production and Systems, Waseda University, Japan. She is currently a Doctor student in Graduate School of Information, Production and Systems, Waseda University, Japan as a scholarship recipient student from the Ministry of Education, Culture, Sports, Science and Technology, Japan. Her research interests include

References (32)

A. van Ooyen et al.
Improving the convergence of the back-propagation algorithm
Neural Netw.
(1992)
R.A. Jacobs
Increased rates of convergence through learning rate adaptation
Neural Netw.
(1988)
W.Z. Lu et al.
Application of evolutionary neural network method in predicting pollutant levels in downtown area of Hong Kong
Neurocomputing
(2003)
Jing-Ru Zhang
A hybrid particle swarm optimization-back propagation algorithm for feed forward neural network training
Appl. Math. Comput.
(2007)
M.F. Moller, A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning PB-339 Reprint, Computer science...
C.H. Chen, H. Lai, An empirical study of the Gradient descent and the conjugate gradient backpropagation neural...
M.T. Hagan, M.B. Menhaj, Training feed-forward networks with the Marquadt algorithm, IEEE International Conference on...
Marco Gori et al.
On the problem of local minima in back-propagation algorithm
Neural Netw.
(1992)
J.M. Yang et al.
A robust evolutionary algorithm for training neural networks
Neural Comput. Appl.
(2001)
X. Yao et al.
A new evolutionary system for evolving artificial neural networks
IEEE Trans. Neural Netw.
(1997)

M. Ahmad, F.M.A. Salam, supervised learning using the Cauchy energy function, in: Proceedings of International...

M.K. Weirs

A method for self-determination of adaptive learning rates in back propagation

Neural Netw.

(1991)

Wang Wen-juan, Cao Jun-xing et al., Reservoir parameter prediction of neural network based on particle swarm...

J. Salerno, Using the particle swarm optimization technique to train a recurrent neural network, in: Proceedings of the...

J.J.F. Cerqueira, A.G.B. Palhares, M.K. Madrid, A simple adaptive back-propagation algorithm for multilayered...

D.E. Rumelhart, G.E. Hinton, R.J. Willianms, Learning internal representations by error propagating, in: Parallel...

Cited by (67)

A strategy learning framework for particle swarm optimization algorithm
2023, Information Sciences
Many variants with various strategies have been proposed to improve the efficiency of Particle Swarm Optimization (PSO) algorithm. These strategies are a precious resource waiting to be exploited. We conjecture that some new combinations of strategies selected from different PSO variants may better improve the performance of PSO. Inspired by this idea, this paper proposes a strategy learning framework to learn an optimal combination of strategies and thus derive a new PSO variant based on this combination. In this framework, a strategy pool with strategies selected from existing PSO variants is first constructed. Then, a training engine, implemented by an adaptive differential evolutionary algorithm, is employed to evaluate the performance of strategy combinations on training benchmark functions. Furthermore, a new PSO variant, named SLFPSO, is created based on the strategies learned from training results. This framework provides a novel method to design PSO variants by learning from existing algorithms through a learning mechanism. The performance and scalability of SLFPSO are compared with ten state-of-the-art PSO variants on 10/30/50/100-dimensional CEC2013/2014/2017 benchmark functions. The results verify that SLFPSO performs significantly better than the compared algorithms in most test scenarios.
Predicting freshwater production in seawater greenhouses using hybrid artificial neural network models
2021, Journal of Cleaner Production
Citation Excerpt :
Some weight connections in the ANN models may have low values. These weights cause the computational time of ANN models to increase significantly (Melo and Watada, 2016). The ALO, based on advanced operators, searches for the optimal values of decision variables in the different systems.
Freshwater production in seawater greenhouses (SWGH) is an important topic for decision-makers in arid lands. Since arid and semi-arid lands face water shortages, the use of SWGH helps farmers to supply water. This study proposed an integrated artificial neural network (ANN) model, namely, the ANN-antlion optimization algorithm (ANN-ALO), for predicting freshwater production in a seawater greenhouse. The width, length, and height of the evaporators and the roof transparency coefficient of the SWGH were used as the inputs of the models. The ability of ANN-ALO was benchmarked against the ANN-particle swarm optimization (ANN-PSO), ANN, and ANN-bat algorithms (ANN-BA). The novelties of the current study are the novel hybrid ANN models, the fuzzy reasoning concept for reducing the computational time, the comprehensive analysis of the uncertainty of the parameters and inputs, and the use of non-climate data. Comparing the models’ performances in the test phase demonstrated that the ANN-ALO model performed best, with a Root Mean Square Error (RMSE) value that was 18%, 33%, and 39% lower than that of the ANN-BA, ANN-PSO, and ANN models, respectively. For the ANN model, the percent bias (PBIAS) value in the training stage was 0.20, whereas for the ANN-BA, ANN-PSO, and ANN-ALO models, it was 0.14, 0.16, and 0.12, respectively. This study also indicated that the width of the seawater greenhouse was the most important parameter for predicting freshwater production.
Furthermore, the results suggested that an evaporator height of 2 m resulted in the highest predicted freshwater production for all the widths except 200 m. The lowest freshwater production for different widths occurred at an evaporator height of 3 m. The generalized likelihood estimation for uncertainty analysis indicated that the uncertainty of the input parameters was lower than that of the model parameters.
Directed particle swarm optimization with Gaussian-process-based function forecasting
2021, European Journal of Operational Research
Particle swarm optimization (PSO) is an iterative search method that moves a set of candidate solution around a search-space towards the best known global and local solutions with randomized step lengths. PSO frequently accelerates optimization in practical applications, where gradients are not available and function evaluations expensive. Yet the traditional PSO algorithm ignores the potential knowledge that could have been gained of the objective function from the observations by individual particles. Hence, we draw upon concepts from Bayesian optimization and introduce a stochastic surrogate model of the objective function. That is, we fit a Gaussian process to past evaluations of the objective function, forecast its shape and then adapt the particle movements based on it. Our computational experiments demonstrate that baseline implementations of PSO (i. e., SPSO2011) are outperformed. Furthermore, compared to, state-of-art surrogate-assisted evolutionary algorithms, we achieve substantial performance improvements on several popular benchmark functions. Overall, we find that our algorithm attains desirable properties for exploratory and exploitative behavior.
Predicting municipal solid waste using a coupled artificial neural network with archimedes optimisation algorithm and socioeconomic components
2021, Journal of Cleaner Production
Solid Waste (SW) is one of the critical challenges of urban life. These SWs are considered environmental pollutants that are a threat to ecology and human health. Predicting SW generation is an essential topic for scholars to better manage SWs. This study investigates the application of optimised ANN models for predicting monthly SW generation in Iran using datasets about seven Iranian megacities. The Archimedes Optimisation Algorithm (AOA), Sine Cosine Algorithm (SCA), Particle Swarm Optimisation (PSO) technique, and Genetic Algorithms (GA) were used for training the ANN model. The enhanced gamma test was used to determine the best input combination. AOA and the gamma test were used concurrently to reduce the time needed for choosing the best input combination. Gross domestic product (GDP), population, household size, and numbers of months were the best input combination set. This best input combination was then inputted into the hybrid and standalone ANN models for predicting monthly SW generation. During the final phase, the outputs of ANN-AOA, ANN-SCA, ANN-PSO, ANN-GA, and ANN models were used as inputs for an inclusive multiple model (IMM) in order to enhance model accuracy. The IMM model reduced the training phase root mean square error (RMSE) of ANN-AOA, ANN-SCA, ANN-PSO, ANN-GA, and ANN models by 55%, 59%, 68%, 72%, and 73%, respectively. Although ANN-AOA provided higher R² and lower RMSE values than ANN-PSO, ANN-SCA, ANN-GA and ANN models, the IMM model outperformed ANN-AOA, considering that it integrates the advantages of all models used in the current study. The current study also used the fuzzy reasoning concept for modifying ANN model structures. The results indicated that such ANN models' time requirement was lower than those without fuzzy reasoning concept. The general results of the current study indicate that the ANN-AOA and the fuzzy-reasoning based Inclusive Multiple Model have a high ability for predicting different target variables.
Particle swarm optimization with state-based adaptive velocity limit strategy
2021, Neurocomputing
Citation Excerpt :
The algorithms related with this strategy include the orthogonal learning PSO (OLPSO) proposed by Zhan et al. [23], orthogonal learning brain storm optimization (OLBSO) proposed by Ma et al. [24] and orthogonal local search genetic algorithm (OLSGA) proposed by Huang et al. [25]. In the third type, apart from the original steps in PSO, extra auxiliary strategies such as limit handling strategy [26,27], restart strategy [28] and Gaussian mutation strategies [29] have been put forward. For example, cooperative coevolutionary bare-bones PSO (CCBBPSO) proposed by Zhang et al. [30] utilizes decomposition methods to reduce the complexity of optimization problems.
Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results because of the incompatibility between VL and the current searching state of particles. To deal with this problem, a novel PSO variant with state-based adaptive velocity limit strategy (PSO-SAVL) is proposed. In the proposed PSO-SAVL, VL is adaptively adjusted based on the evolutionary state estimation (ESE) in which a high value of VL is set for global searching state and a low value of VL is set for local searching state. Besides that, limit handling strategies have been modified and adopted to improve the capability of avoiding local optima. The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions. The satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems is also verified. Besides, the merits of the strategies in PSO-SAVL are verified in experiments. Sensitivity analysis for the relevant hyper-parameters in state-based adaptive VL strategy is conducted, and insights in how to select these hyper-parameters are also discussed.
Improving multi-objective algorithms performance by emulating behaviors from the human social analogue in candidate solutions
2021, European Journal of Operational Research
Citation Excerpt :
We also wanted to test how the proposed mutation operator (MOEDB) performs in comparison to others, than the PLM, mutation operators. Below, we provide an experimental comparison of the proposed mutation operator (MOEDB), against four alternative mutation operators, namely: (a) Uniform mutation (Michalewicz, 1994), (b) Gaussian mutation (Melo & Watada, 2016), (c) Cauchy mutation (Gong et al., 2010) and (d) Non-Uniform mutation (Chen et al., 2008). In Table 12 we summarize the experimental results (i.e. mean values and standard deviations over 30 runs for hypervolume (HV), inverted generalization distance (IGD) and epsilon indicator respectively) obtained by two (2) different tests.
The fundamental unit of each evolutionary algorithm is the individual. Each individual represents a potential solution to the problem at hand. Despite the importance of individual solution for multi-objective algorithms’ performance the majority of the existing implementations select a simplistic approach by assuming identical behavior for all candidate solutions of a population. However, from the biological analogue we know that individuals do not react similarly to the same stimulus. This is called character and it is lacking from existing implementations. In this paper, we emulate the corresponding human social analogue by generating individuals that exhibit different behavior when are subject to the same stimulus. The implementation of different behaviors is facilitated through a novel mutation operator. The experimental results favor the proposed approach when compared with other state-of-the-art algorithms for a number of test instances.

View all citing articles on Scopus

Junzo Watada received his B.S. and M.S. degrees in Electrical Engineering from Osaka City University Japan, and doctorate of Engineering on “Fuzzy Multivariant Analysis and its Applications” from Osaka Prefecture University, Japan. His professional interests include soft computing, tracking system, knowledge engineering, and management engineering. He is a recipient of the Henri Coanda Gold Medal from Invention in Romania, in 2002. He is a Life Fellow of Japan Society for Fuzzy Theory and intelligent Informatics (SOFT) and Biomedical Fuzzy System Association (BMFSA). He is the Principal Editor, Co-Editor, or Associate Editor of various international journals, including the International Journal of Biomedical Soft Computing and Human Sciences, ICIC Express Letters, the International Journal of Systems and Control Engineering, and Fuzzy Optimization and Decision Making.

View full text

Gaussian-PSO with fuzzy reasoning based on structural learning for training a Neural Network

Abstract

Introduction

Section snippets

Gaussian Particle Swarm Optimization

Feed forward neural network

Encoding strategy

Structural learning

Numerical examples

Conclusion

Neural Netw.

Neural Netw.

Neurocomputing

Appl. Math. Comput.

On the problem of local minima in back-propagation algorithm

Neural Netw.

A robust evolutionary algorithm for training neural networks

Neural Comput. Appl.

A new evolutionary system for evolving artificial neural networks

IEEE Trans. Neural Netw.

A method for self-determination of adaptive learning rates in back propagation

Neural Netw.