An evolutionary constructive and pruning algorithm for artificial neural networks and its prediction applications

doi:10.1016/j.neucom.2012.01.024

Neurocomputing

Volume 86, 1 June 2012, Pages 140-149

https://doi.org/10.1016/j.neucom.2012.01.024 Get rights and content

Abstract

We propose a method for designing artificial neural networks (ANNs) for prediction problems based on an evolutionary constructive and pruning algorithm (ECPA). The proposed ECPA begins with a set of ANNs with the simplest possible structure, one hidden neuron connected to an input node, and employs crossover and mutation operators to increase the complexity of an ANN population. Additionally, cluster-based pruning (CBP) and age-based survival selection (ABSS) are proposed as two new operators for ANN pruning. The CBP operator retains significant neurons and prunes insignificant neurons on a probability basis and therefore prevents the exponential growth of an ANN. The ABSS operator can delete old ANNs with potentially complex structures and then introduce new ANNs with simple structures; thus, the ANNs are less likely to be trapped in a fully connected topology. The ECPA framework incorporates constructive and pruning approaches in an attempt to efficiently evolve compact ANNs. As a demonstration of the method, ECPA is applied to three prediction problems: the Mackey-Glass time series, the number of sunspots, and traffic flow. The numerical results show that ECPA makes the design of ANNs more feasible and practical for real-world applications.

Introduction

Many numerical algorithms to accurately predict the trends of time series in the future have been proposed, including the autocorrelation method [1], the covariance method [2], and grey theory [3]. Recently, to further improve the accuracy of time series prediction, investigators have focused on intelligent algorithms based on artificial neural networks (ANNs) due to their learning abilities and powerful prediction capability [4], [5].

ANNs were first developed to imitate biological neural systems and are organized into several interconnected simple processing units called neurons or nodes. ANNs are data-driven approaches that learn from examples, even when the input–output relationships are unknown [6]. Thus, ANNs can accurately solve problems without prior knowledge when sufficient observed data are supplied. This property is useful for evaluating numerous forecasting problems because acquiring data is easier than making good theoretical guesses about certain systems.

An important component of every ANN is architecture selection, which involves determining an appropriate architecture to accurately fit the underlying function described by the training data [7]. An architecture that is too large may precisely fit the training data but may provide poor generalization due to overfitting of the training data. Conversely, an architecture that is too small saves computational costs but may not possess sufficient processing ability to accurately approximate the underlying function. Therefore, architecture selection should consider both network complexity and goodness of fit.

For prediction purposes, it has been shown that a feedforward ANN with a single hidden layer is sufficient to achieve any desired accuracy [8]. In most applications, ANNs are fully connected, i.e., all inputs are fully connected to all hidden neurons. Numerous studies have shown that partially connected ANNs have better storage capability per connection than fully connected ANNs [9], [10]. Furthermore, partially connected ANNs can yield improved generalization capabilities with reduced cost in terms of hardware and processing time [11]. However, how to determine the optimal numbers of hidden neurons and connections remains an open question.

Among several algorithms for designing three-layered ANNs, the most frequently used algorithms are the constructive, pruning, and constructive-pruning algorithms. A constructive algorithm [12] starts with a minimal ANN architecture, a three-layered ANN with one hidden neuron. The algorithm adds hidden neurons to the minimal ANN, one-by-one, during the training phase. The advantage of the constructive algorithm is that the initial phase can simply set the number of hidden layers and neurons as one each. However, deciding when to add hidden neurons or connections and when to stop the addition process is difficult.

A pruning algorithm [13] starts with an oversized architecture and then deletes unnecessary hidden neurons or connections, either during training or upon convergence to a local minimum. Each iteration of the pruning algorithm determines which unit, i.e., which hidden neuron or connection, to prune via its relevance or significance. Several pruning criteria have been proposed, for example, sensitivity analysis [14] and magnitude-based pruning [15]. Sensitivity analysis is based on Taylor expansion and reflects the ways in which the derivatives of a performance function can be applied to quantify a system's response to unit perturbations. Magnitude-based pruning assumes that small weights are irrelevant. However, no criterion can be used to determine the initially oversized architecture for a given problem [12].

In the constructive algorithm, the architecture of the ANN may become oversized if the addition procedure is not appropriately stopped. A number of algorithms have attempted to combine constructive and pruning algorithms to solve the aforementioned problem [16], [17]. These constructive-pruning algorithms first estimate the number of hidden neurons and/or connections via a constructive method. A pruning method is then used to delete the inappropriate hidden neurons and/or connections to find a near-optimal architecture for a given problem. However, determining when to stop the pruning procedure is difficult [18].

Several researchers have developed methods for designing ANNs using evolutionary algorithms (EAs). EAs emerged as a biologically plausible approach for adapting various ANN parameters such as weight values and architectures [19]. Recently, several studies have been proposed to employ various EAs to prune NNs. Mantzaris et al. [20] pruned probabilistic neural network by genetic algorithm to minimize the number of diagnostic factors, and therefore minimized the number of input nodes and hidden layers. Curry and Morgan [21] proposed a modified feedforward neural network which is pruned and optimized by means of differential evolution for seasonal data. Huang and Du [22] use particle swarm optimization to prune the radial basis probabilistic neural networks. Masutti and Castro [23] combined characteristics from self-organizing networks and artificial immune systems to solve the traveling salesman problem and pruned neurons which are not related to a city. Furthermore, numerous works have been done to perform EAs and pruning methods separately or simultaneously. Kaylani et al. [24] incorporated prune operator into a genetic algorithm as a mutation operator to design ARTMAP architecture for classification problems. Goh et al. [25] developed a hybrid multiobjective evolutionary approach for adaptation of ANNs structures and a geometrical approach in identifying hidden neurons to prune for classification problems. Hervás-Martínez et al. [26] applied an evolutionary algorithm to design the structure and weights of a product-unit neural network, and finally used a backward stepwise procedure to prune variables sequentially until no further pruning can be made to improve the fit. However, most encoding schemes must predefine the chromosome length, which is problem-dependent. This user-defined length can affect the flexibility of problem representation and EA efficiency [27], [28].

Herein, we propose a new approach for designing ANNs, the evolutionary constructive and pruning algorithm (ECPA). This algorithm directs the evolution of the ANN topology using constructive and pruning methods in an evolutionary manner. In ECPA, a variable-length chromosome representation is adopted to describe ANNs with different architectures. Thus, it is not necessary to predefine the length of the chromosome, and this makes the use of memory more efficient. Furthermore, ECPA introduces the concept of constructive method into the crossover and mutation operations in a manner that allows the initial structure of the ANN to be simply set as a minimal network containing one hidden neuron with a single connection to one input. The crossover and mutation operations then enlarge the architecture by adding hidden neurons and connections. ECPA then prunes the resulting ANNs via a newly developed scheme consisting of cluster-based pruning (CBP) and age-based survival selection (ABSS).

The rest of this paper is organized as follows. Section 2 describes the proposed ECPA in detail. Section 3 demonstrates the proposed algorithm's ability to evolve partially connected ANNs for a variety of problems of interest. Finally, in Section 4, we present our conclusions.

Section snippets

ECPA

Based on the characteristics of ANNs and EA, we propose ECPA to develop ANNs based on an evolutionary constructive and pruning manner. As discussed in [29], theoretical work has shown that a single hidden layer is sufficient for forecasting purposes. Therefore, in this work, we designed a three-layer feedforward ANN with an input layer, a hidden layer, and an output layer. The major steps of ECPA are summarized in Fig. 1 and explained below.

Initialization phase

(Step 1)
Generate an initial population

Experimental results

In this section, we demonstrate the performance of the proposed algorithm using three time series prediction problems: Mackey-Glass, sunspots, and vehicle count. The first time series is generated from the Mackey-Glass differential equation, the second series is recorded from the sunspots, and the third series is obtained from the hourly vehicle count for the Monash Freeway outside Melbourne in Victoria, Australia, beginning in August, 1995. During the evolutionary process, the

Conclusions

A novel structural learning algorithm, called ECPA, is proposed for the design of ANNs based on an evolutionary constructive and pruning algorithm. ECPA evolves the ANNs starting with a minimal structure: one hidden neuron connected to an input node. The crossover and mutation operations make the ANN structures more complex, whereas CBP and ABSS make the ANN structures more compact. The results of the numerical simulations show that the use of CBP and ABSS operations indeed generates compact

Acknowledgments

This work was supported in part by the National Science Council, Taiwan, R.O.C., under Contract No. NSC 99-2221-E-009-107 and in part by a grant provided by the Industrial Technology Research Institute under Contract No. A353C40000B1-4.

Shih-Hung Yang received his B.S. degree in Mechanical Engineering and M.S. degree in Electrical and Control Engineering from National Chiao Tung University, Taiwan, in 2002 and 2004, respectively. He is currently working toward the Ph.D. degree in Electrical and Control Engineering at National Chiao Tung University, Taiwan. His researches include machine vision, neural networks, and evolution computation. He was a recipient of the Outstanding Teaching Assistant Award from the ECE Department at

References (46)

K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989)
J. Sietsma et al.
Creating artificial neural networks that generalize
Neural Networks
(1991)
Y. Hirose et al.
Back-propagation algorithm which varies the number of hidden units
Neural Networks
(1991)
D. Mantzaris et al.
Genetic algorithm pruning of probabilistic neural networks in medical disease estimation
Neural Networks
(2011)
T.A.S. Masutti et al.
Neuro-immune approach to solve routing problems
Neurocomputing
(2009)
A. Kaylani et al.
AG-ART: an adaptive approach to evolving ART architectures
Neurocomputing
(2009)
C. Hervás-Martínez et al.
Multilogistic regression by means of evolutionary product-unit neural networks
Neural Networks
(2008)
G. Zhang et al.
Forecasting with artificial neural networks: the state of the art
Int. J. Forecast.
(1998)
J.M. Zurada et al.
Perturbation method for deleting redundant inputs of perceptron networks
Neurocomputing
(1997)
H. Du et al.
Time series prediction using evolving radial basis function networks with new encoding scheme
Neurocomputing
(2008)

C. Harpham et al.

The effect of different basis functions on a radial basis function network for time series prediction: a comparative study

Neurocomputing

(2006)

I. Rojas et al.

Time series analysis using normalized PG-RBF network with regression weights

Neurocomputing

(2002)

Y. Chen et al.

Time-series prediction using a local linear wavelet neural network

Neurocomputing

(2006)

K.B. Cho et al.

Radial basis function based adaptive fuzzy systems and their applications to system identification and prediction

Fuzzy Sets Syst.

(1996)

W.K. Wong et al.

Adaptive neural network model for time-series forecasting

Eur. J. Oper. Res.

(2010)

G.P. Zhang

Time series forecasting using a hybrid ARIMA and neural network model

Neurocomputing

(2003)

M. Kulesh et al.

Adaptive metrics in the nearest neighbours method

Phys. D

(2008)

J.D. Markel et al.

Linear Prediction of Speech

(1976)

J. Makhoul

Linear prediction: a tutorial review

Proc. IEEE

(1975)

J.L. Deng

Introduction to grey system theory

J. Grey Syst.

(1989)

M. Adya et al.

How effective are neural networks at forecasting and prediction? A review and evaluation

J. Forecast.

(1998)

S.H. Yang et al.

Intelligent forecasting system using grey model combined with neural network

Int. J. Fuzzy Syst.

(2011)

B. Kosko

Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence

(1992)

Cited by (63)

Multiobjective bilevel programming model for multilayer perceptron neural networks
2023, Information Sciences
The architecture of multilayer perceptron (MLP) neural networks dictates the network's performance. However, aiming at the specific classification problems, suitable architectures of MLPs must be determined beforehand. More often, the design decisions rely on a trial and error learning and the experience knowledge. To automatically design an MLP's network architecture and optimize network parameters, a multiobjective bilevel programming model is built. In this model, a multiobjective optimization problem is constructed in the upper level for obtaining a set of Pareto optimal architectures of the MLPs, considering network complexity, training error rate, and validation error rate, while a single-objective optimization problem is established in the lower level to search for the optimum network parameters for a given network architecture. For dealing with this model efficiently, a novel multiobjective hierarchical learning algorithm (MOHLA) is proposed, in which an integer-coding NSGA-II is developed as the upper-level optimizer for a set of Pareto optimal network structures of the MLPs, while a non-iterative method is regarded as the lower-level solver for the MLPs' connection parameters. After a set of trained MLPs is obtained finally by using MOHLA, a selective ensemble strategy is adopted for improving identification accuracy. Three types of multiobjective bilevel programming models are investigated and compared in the experiments. Moreover, the proposed MOHLA is compared with several state-of-the-art learning approaches on various classification problems. The experimental results confirm that MOHLA performs well.
Exploration for a BP-ANN model for gas identification and concentration measurement with an ultrasonically radiated catalytic combustion gas sensor
2022, Sensors and Actuators B: Chemical
The ultrasonic radiation method provides a new solution to the single sensor based gas analysis. But it has been unknown whether the artificial neural network (ANN) can be effectively applied in the gas analysis with an ultrasonically radiated single gas sensor and how to apply. In this work, the BP-ANN model which can effectively implement the gas identification and concentration measurement with an ultrasonically radiated catalytic combustion gas sensor is explored, and a BP-ANN model with prominent performance in the gas identification and concentration measurement, named GWO-DHBP (double hidden layer BP), is found. Its feature set is designed with the assistance of the minimal redundancy maximal relevance (MRMR) method, and its initial weights and biases are optimized by the grey wolf optimization (GWO). The results show that the model has quite good gas recognition accuracy (97.3%) and small gas concentration measurement error (5.79%) in the gas concentration range of 2%−20%LEL (LEL=Lower Explosive Limit), with a faster convergence speed than the single-hidden-layer and Elman neural networks models with the GWO. The GWO is employed to overcome the BP-ANN’s drawbacks such as easily falling into local minimum, slow convergence and poor generalization. It is demonstrated that the GWO-DHBP model is a promising algorithm for the gas identification and concentration measurement with the ultrasonically radiated catalytic combustion gas sensor, and a good feature vector may be achieved by using the experience, MRMR and the neural network which is going to be employed in the modeling.
Optimization of ANN Architecture: A Review on Nature-Inspired Techniques
2019, Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
Artificial neural network (ANN) introduces different types of neural network structures and has been applied successfully in diverse domains of real-world problems. Among various available architectures, the feedforward neural networks (FNNs) are comparatively simple. The flexible structure and availability of good learning algorithms makes FNNs very popular. After 1980s, when the age of ANN started again, researchers identified that there is no formal straight forward approach for modeling optimal FNNs. Optimal FNNs may be viewed as: optimal weights, optimal hidden layers, optimal hidden neurons, and optimal learning algorithm, and so on. The important purpose of optimizing FNN is to enhance its generalized performance. This chapter aims to cover a wider range of FNN optimization approaches with emphasis on nature inspired algorithms.
Robust convolution kernel quantity determination based on corner radiation area adaptation
2018, Neurocomputing
Citation Excerpt :
One is network pruning, the other is convolution kernel quantity adaptation. Network pruning reduces overfitting by pruning the model structure and a great deal of further developments [7–11] have been proposed. Convolution kernel quantity adaptation adapts the number of convolution kernels through the features of the data set itself to avoid overfitting.
The number of kernels in a convolutional neural network (CNN) can have a significant impact on the performance of the CNN in the aspects of accuracy and computation efficiency. However, existing approaches to determining the number of convolution kernels are mainly conducted through a manual process, which suffers from the problems of potential overfitting, instability and inefficiency. In response to these problems, this paper presents a corner radiation area adaptation (CRAA) based method to automatically determine the number of convolution kernels. CRAA is evaluated in comparison with three representative methods on multiple public data sets. Experimental results show that CRAA is robust to the number of convolution kernels which is well adapted to a specific data set and achieves a higher level of classification accuracy by 3% when spending the same period of time in classification. More importantly, CRAA reduces the computational time by 15% in comparison with the three representative approaches when reaching the same level of accuracy in classification.
Constructive Deep Neural Network for Breast Cancer Diagnosis
2018, IFAC-PapersOnLine
The Oncotype DX (ODX) breast cancer assay is the worldwide most common and used Gene Expression Profiling (GEP) test. This ODX assay has a great impact on Adjuvant ChemoTherapy (ACT) decision. However, many standard approaches have been proposed and suggested to practitioners. The accuracy of such methods never reached the highest level. This paper deals with the Breast Cancer Computer Aided Diagnosis (BC-CAD) based on a Deep Constructive Neural Network used for the Recurrence Score (RS) prediction of the ODX assay. The proposed ConstDeepNet algorithm was tested to build two classifiers. In the first architecture, a ”one against all” structure is used where one Deep Neural Network is built for each class. In the second architecture, one DNN is used for the three classes. The proposed BC-CAD algorithm is tested on a real data-set and exhibits good performance. The study data set contains 92 cases carcinoma mammary luminal B with available Oncotype DX test results from 2012 to 2017 taken from the Georges Francois Leclerc cancer centre and the North Trévenans County Hospital located respectively in Dijon and Belfort in France.
Optimization of neural network using kidney-inspired algorithm with control of filtration rate and chaotic map for real-world rainfall forecasting
2018, Engineering Applications of Artificial Intelligence
Citation Excerpt :
However, classical techniques frequently face difficulties in solving optimization problems in the real world; they tend to require a large amount of computational time, large amount of memory, become trapped in local optima and produce poor-quality solutions. In order to overcome these difficulties, metaheuristic algorithms are increasingly been used to train ANNs (see for example, Islam et al., 2009; Oh et al., 2009; Curry and Morgan, 2010; Kaylani et al., 2010; Mantzaris et al., 2011; Zanchettin et al., 2011; Yang and Chen, 2012; Jaddi et al., 2013, 2015a,b; Jaddi and Abdullah, 2017; Jaddi et al., 2017a). A review of the design of feed forward neural networks is given in Ojha et al. (2017).
A broad variety of real-world problems have been solved using multilayer perceptron (MLP) artificial neural networks (ANNs). Optimization techniques aid ANNs to select suitable weights and achieve correct results. Recently, the kidney-inspired algorithm (KA) has been proposed for optimization problems. This algorithm is based on the filtration, reabsorption, secretion, and excretion processes that take place in the kidneys of the human body. In the KA, the value of $α$ in the filtration rate formula is a constant value in the range of [0, 1] that is set in the initialization stage of the algorithm. In this paper, an improved KA for optimization of the ANN model is presented in which the filtration rate is controlled by changing the value of $α$ from minimum to maximum during the search process, which helps in achieving a better balance between exploration and exploitation in the algorithm. In this algorithm if more solutes are filtered and moved to filtered blood it means that the algorithm has more exploration. In contrast, if more solutes move to waste it means that more exploitation is performed by the algorithm. In addition, the separate use of three chaotic maps instead of a random number in the movement formula of the modified KA is investigated in order to assess the ability of each map to help to achieve superior results. The proposed method is tested on benchmark classification and time series prediction problems. The method is also applied to a real-world rainfall forecasting problem. The results of a statistical analysis prove the ability of the method.

View all citing articles on Scopus

Yon-Ping Chen received his B.S. degree in Electrical Engineering from National Taiwan University, Taiwan, in 1981, and M.S. and Ph.D. degrees in Electrical Engineering from University of Texas at Arlington, USA, in 1986 and 1989, respectively. He is a Distinguished Professor in the Department of Electrical Engineering, National Chiao Tung University, Taiwan. His researches include control, image signal processing, and intelligent system design.

View full text

An evolutionary constructive and pruning algorithm for artificial neural networks and its prediction applications

Abstract

Introduction

Section snippets

ECPA

Experimental results

Conclusions

Acknowledgments

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neurocomputing

Neurocomputing

Neural Networks

Int. J. Forecast.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Fuzzy Sets Syst.

Eur. J. Oper. Res.

Neurocomputing

Phys. D

Linear Prediction of Speech

Linear prediction: a tutorial review

Proc. IEEE

Introduction to grey system theory

J. Grey Syst.

How effective are neural networks at forecasting and prediction? A review and evaluation

J. Forecast.

Intelligent forecasting system using grey model combined with neural network

Int. J. Fuzzy Syst.

Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence