Class-specific soft voting based multiple extreme learning machines ensemble
Introduction
Extreme learning machine (ELM) becomes popular for solving classification problem due to its light computational requirements. It is an extension of the single-hidden layer feedforward networks (SLFNs). By making use of a least-square method, it analytically obtains the output weights of SLFNs [1]. Moreover, ELM emphasizes on achieving both the smallest norm of output weights and the least training error, which is different from conventional neural type of SLFNs. Essentially, ELM is originally designed by utilizing random computational nodes, which are independent of the training data. The process for tuning the hidden layer parameters is avoided, which significantly shortens the learning time. A great many of ELM based algorithms have been done in recent years [1], [2], [3].
However, since the input hidden nodes are randomly generated, it is easy to misclassify patterns that are close to the boundary [3], [4]. In order to improve the classification performance, a number of real world applications based on ensemble learning have been done in previous research [2], [5], [6], [7]]. Different from designing a single classifier in traditional pattern recognition field, ensemble learning consists of a group of machine learning algorithms that aims at constructing multiple classifiers to form a hybrid predictive model. Generally speaking, the overall classification performance of ensemble classifier could be better than using a single classifier. The ensemble learning aims at a high accurate prediction at the expense of increased complexity. In multiple classifier system (MCS), the field of ensemble learning usually employs homogeneous base learners. In the past few decades, many ensemble techniques [8], [9], [10] are proposed to enhance the reliability of multiple models. Besides, ensemble methods are also successfully applied into applications from a wide range of fields [11], [12], [13] due to their remarkable capability in increasing the classification performance of a learning model.
Numerous works have been proposed regarding to ensemble ELMs in recent years. In [14], Liang et al. proposed the online sequential extreme learning machine (OS-ELM), which shows better generalization behavior than the other sequential algorithms. Then in [5], Lan et al. extended OS-ELM to an ensemble version and improved the stability. In [6], Liu and Wang pointed out that ELM might be prone to overfit since it approximates the training data in learning phase. To alleviate this problem, they presented an ensemble based ELM (EN-ELM) and embedded the cross-validation into the training process. Wang and Li [7] designed a dynamic Adaboost ensemble method by using multiple ELMs with fuzzy activation function to deal with large data sets. Different from this method, Zhai et al. [2] developed the sample entropy based dynamic ensemble to handle the instability and overfitting problems of ELM. Wang et al. [4] regarded the upper integral as a base classifier and constructed an upper integral network through the learning mechanism of ELM. van Heeswijk et al. [15] introduced an adaptive ensemble models of ELMs for time series prediction. The proposed algorithm aims at performing well on both nonstationary and stationary time series. Besides, Heeswijk et al. [16] also proposed the GPU-accelerated and parallelized ELM ensemble to perform regression on large data sets.
As aforementioned, the randomly selected parameters in ELM will lead to unstable training accuracy. Therefore, Cao et al. [3] designed a voting based extreme learning machine (V-ELM) by employing ELM as base classifier under the framework of majority voting. Despite the demonstrated reliability and stability, the fact that different base classifiers have different classification performances has not been considered in their work. In ensemble learning techniques, the combinative linear classifier adopts weighted voting instead of simple voting when the accuracies of base classifiers are unequal [17]. In view of this reason, weighted fusion methods can be utilized to evaluate the confidence degree of each base classifier. The weighted voting methods mainly include the weighted majority vote schemes [18] and weighting methods [19], [20] with classifier-specific weights. In previous research, the minimum square error (MSE) based method [21] was proposed as a class-specific optimal weighting approach used for linear combination of multiple classifiers. It is easy to be implemented but the weights are not optimized. Recently, Zhang and Zhou [22] proposed three new weighted combination approaches that was inspired by the idea of sparse ensembles. They employed linear programming (LP) algorithm to select classifiers and tuned their weights simultaneously. This approach made use of optimization tool to get satisfactory results. However, the weights they used are also defined on the classifier-specific level. One cannot delicately tune or optimize the weight assignment distribution on this level. Therefore, in this work, we apply the class-specific weight based soft voting for ELM classifiers (CSSV-ELM) by optimizing a class-specific weight based model. Further, since the latter steps of ELM can be regarded as solving a linear equation problem, it may suffer from the ill-conditioned problem. Thus, the condition number of the inverse of the weight matrix between hidden nodes and output nodes could be considered as part of the constraints in the optimization model. Particularly, this model takes into account both the best-worst weighted voting measure and the condition number simultaneously.
To improve computational efficiency and increase test speed, another interesting problem is how to construct sparse ensemble for class-specific soft voting scheme. Concretely, sparse ensemble aims at finding a sparse weight vector to sparsely represent the outputs of multiple classifiers. In classifier-specific weight, the sparse ensemble concept is equivalent to ensemble pruning, which aims at selecting an optimal sub-ensemble (a subset of classifiers) from a weight vector. However, in class-specific soft voting scheme, a class-based weight matrix should be determined. The pruned ensemble methods [9], [22], [23], [24] are not well suited to obtain the sparse ensemble with class weight matrix. Compared with these conventional pruning methods, sparse representation based methods have been popular recently due to its flexibility to construct various optimization models based on diverse problems. The other advantage is that once the model is built by selecting an appropriate over-complete dictionary, the corresponding solution algorithm is well prepared.
Furthermore, sparse representation has shown strong relationship to classification and face recognition [25], [26], [27], [28]. The main technique for sparse representation is sparse coding and its variants has been successfully used in face recognition. Therefore, the sparse coding (SC) techniques are applied so as to represent class weight coefficients sparsely for multiple ELMs in this paper. The problem of SC origins from sparse representations of signals and the goal is to find a linear decomposition of a signal with a few atoms of a over-complete dictionary [28]. However, the objective function for optimizing the class-specific based weights does not naturally fit to the sparse coding condition since the “dictionary matrix” is not over-complete. In face recognition field, a common way for solving this problem is to map the high-dimensional data to low-dimensional spaces by using feature extraction techniques. Thus, in this work, we apply an iterative optimization algorithm for adapting the feature extraction projection matrix and the weight coefficients for each class k simultaneously to exploit more robust and efficient classifier. We named the proposed model as sparse based class-specific soft voting ELM (SpaCSSV-ELM). On one hand, for fixed , the update of is to exploit more appropriate transformation for original conditional probability outputs of base classifiers. On the other hand, the weight can be refined when is fixed. In this way, the learned and can improve the robustness of sparse representation of the proposed model.
In this work, two new optimization methods based on class-specific weight are proposed for multiple ELMs with three contributions:
- •
The first contribution is that a convex optimization model (CSSV-ELM) is designed based on the class-specific soft voting scheme.
- •
The second contribution lies in constructing the constraints of optimization model (CSSV-ELM). Besides the constraint and for each class, the weight constraint for each classifier is formed by combining the worst-best weighted voting and the condition number of ELM, which guarantees the importance and stability of each component ELM.
- •
The third contribution focuses on learning the sparse weight vector based on class-specific soft voting method with ELM as base classifier.
Overall, under the framework of class-specific weight based soft voting, two models that are related to ELM characteristic and sparse ensemble aspects, are designed to improve performance in terms of accuracy and sparsity. The rest of this paper is organized as follows. Section 2 briefly reviews ELM and some background on ensemble learning. Section 3 describes the proposed CSSV-ELM and SpaCSSV-ELM algorithms. Experimental results are shown in Section 4. Finally, Section 5 gives a conclusion.
Section snippets
Extreme learning machine
For a classification problem, we typically have a d-dimensional training data set with patterns that belong to one of m classes each. In this paper, let the data set denoted as , where . In neural network field, the task for supervised learning is transformed to minimize a regression cost function , where is the target output matrix, and is the output of network with L hidden nodes:where and
Proposed ensemble method for ELM
In this section, the first part interprets the concept of condition number and how this indicator can be calculated from ELM structure. The second part introduces a classifier-based weighted voting method: best-worst weighted voting, and by combining these two parts, a new constraint for the optimization model 1 is prepared. Then Section 3.3 describes the detail of model 1: CSSV-ELM. Specifically, the class-specific weight soft voting method is reformed as the optimization objective of model.
Experiments
In order to gain insight into the performances of the proposed two models, which are constructed based on class-specific soft voting framework, twenty UCI data sets [40] are employed to compare the behaviors among CSSV-ELM, CSSVSpa-ELM, single ELM (S-ELM), voting ELM (V-ELM) and several weighted voting methods (MSE-ELM, LP1-ELM, LP2-ELM and LP3-ELM), which employ ELM as base classifier.
The MSE-based method [41] mainly emphasizes on calculating an optimal weight parameter , which can
Conclusion
In this work, we discussed the weight optimization issues based on class-specific soft voting for combining multiple ELMs. The first model not only deals with the ill-conditioned problem of ELM, but also integrates the traditional weighted voting schemes. Whereas the second model considers the sparsity issue of weight coefficients and retains the classification performance at the same time. Experimental results show that the proposed model 1 is statistically superior to all the compared
Acknowledgements
The work described in this paper was partially supported by the Fundamental Research Funds for the Central Universities (WUT:2014-IV-054), National Natural Science Foundation of China under the Grant No. 61175123, City University Applied Research Grant 9667094 and Chinese National Science Foundation (Grant No. 61272289).
Jingjing Cao received her B.Sc. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.Sc. degree in Applied Mathematics from the same university in 2008. She got her Ph.D. degree in the Department of Computer Science in City University of Hong Kong, Hong Kong. She joined the Wuhan University of Technology, China, in 2013, where she is currently a lecturer in the School of Logistics Engineering. Her research interests are in ensemble learning, extreme
References (43)
- et al.
Extreme learning machinetheory and applications
Neurocomputing
(2006) - et al.
Upper integral network with extreme learning mechanism
Neurocomputing
(2011) - et al.
Ensemble of online sequential extreme learning machine
Neurocomputing
(2009) - et al.
GPU-accelerated and parallelized ELM ensembles for large-scale regression
Neurocomputing
(2011) - et al.
Sparse ensembles using weighted combination methods based on linear programming
Pattern Recognit.
(2011) - et al.
Simultaneous discriminative projection and dictionary learning for sparse representation based classification
Pattern Recognit.
(2013) - et al.
Joint dynamic sparse representation for multi-view face recognition
Pattern Recognit.
(2012) - et al.
Dynamic ensemble extreme learning machine based on sample entropy
Soft Comput.
(2012) - et al.
Voting based extreme learning machine
Inf. Sci.
(2011) - et al.
Ensemble based extreme learning machine
IEEE Signal Process. Lett.
(2010)
Ensemble tracking
IEEE Trans. Pattern Anal. Mach. Intell.
An analysis of ensemble pruning techniques based on ordered aggregation
IEEE Trans. Pattern Anal. Mach. Intell.
Accuracy/diversity and ensemble MLP classifier design
IEEE Trans. Neural Netw.
Cost-sensitive boosting
IEEE Trans. Pattern Anal. Mach. Intell.
Semiboostboosting for semi-supervised learning
IEEE Trans. Pattern Anal. Mach. Intell.
Adaboost-based algorithm for network intrusion detection
IEEE Trans. Syst. Man Cybern. Part B
A fast and accurate online sequential learning algorithm for feedforward networks
IEEE Trans. Neural Netw.
Ensemble Methods: Foundations and Algorithms
Combining Pattern Classifiers: Methods and Algorithms
Cited by (0)
Jingjing Cao received her B.Sc. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.Sc. degree in Applied Mathematics from the same university in 2008. She got her Ph.D. degree in the Department of Computer Science in City University of Hong Kong, Hong Kong. She joined the Wuhan University of Technology, China, in 2013, where she is currently a lecturer in the School of Logistics Engineering. Her research interests are in ensemble learning, extreme learning machine, evolutionary algorithms, and their applications.
Sam Kwong received his B.Sc. degree and M.A.Sc. degree in electrical engineering from the State University of New York at Buffalo, USA and University of Waterloo, Canada, in 1983 and 1985 respectively. In 1996, he later obtained his Ph.D. from the University of Hagen, Germany. From 1985 to 1987, he was a diagnostic engineer with the Control Data Canada where he designed the diagnostic software to detect the manufacture faults of the VLSI chips in the Cyber 430 machine. He later joined the Bell Northern Research Canada as a Member of Scientific staff. In 1990, he joined the City University of Hong Kong as a lecturer in the Department of Electronic Engineering. He is currently an associate Professor in the department of computer Science.
Ran Wang (S’09) received the Bachelors degree from the College of Information Science and Technology, Beijing Forestry University, Beijing, China, in 2009. She is currently pursuing the Ph.D. degree at the Department of Computer Science, City University of Hong Kong, Hong Kong. Her current research interests include support vector machines, extreme learning machines, decision tree induction, active learning, multiclass classification, and the related applications of machine learning.
Xiaodong Li received the Bachelor degree from the Department of Computer Science and Technology, Nanjing University, Nanjing, China, in 2006. He is currently pursuing the Ph.D. degree at the Department of Computer Science, City University of Hong Kong, Hong Kong. His current research interests include support vector machines, boosting, market micro structure and algorithmic trading.
Ke Li was born in Hunan, China, in 1985. He received the B.Sc. and M.Sc. degrees in computer science and technology from the Xiangtan University, China, in 2007 and 2010, respectively. He is current pursuing the Ph.D. degree at City University of Hong Kong. His current research interests include the evolutionary multi-objective optimization, surrogate-assisted evolutionary algorithms and statistical machine learning techniques.
Xiangfei Kong received his Bachelor’s degree at Shandong University of Science and Technology in 2009. He is a Ph.D. student of Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong. His research interests include pattern recognition, image denoising, and computer vision.