Elsevier

Neurocomputing

Volume 261, 25 October 2017, Pages 57-69
Neurocomputing

Enhancing ELM by Markov Boundary based feature selection

https://doi.org/10.1016/j.neucom.2016.09.119Get rights and content

Abstract

ELM, as an efficient classification technology, has been used in many popular application domains. However, ELM has weak generalization performance when the data set is small with respect to its feature space. In this paper, an enhanced ELM algorithm based on representative features is proposed to address the problem. At first, the method automatically generates some discrete intervals for every continuous feature. Then, it removes the irrelevant features by a method considering the feature interaction and reduces the weakly relevant features by a mutual information based method. Further, the reduction of redundancy features is conducted. Instead of constructing a large Bayesian network using all features, we just select the features of high relevance with the object node by an improved Markov Boundary identifying algorithm. Finally, we obtain the enhanced ELM classifier by training ELM using the extracted representative features and a genetic algorithm based weight assignment mechanism. The experiments conducted on real and synthetic small sample data sets demonstrate that the enhanced ELM classifier based on representative features outperforms the other methods used in our comparison study in terms of both efficiency and effectiveness.

Introduction

In recent years, classification problem has regained extensive research efforts from computer scientists, due to the explosive emergence of new classification applications, especially with the emergence of the big data. It is one of the challenges for the researcher on how to learn a model from the various data and classify the data quickly, such as protein sequences classification in bioinformatics, online social network prediction, XML document classification, mobile object data classification, cloud resource classification, online real-time stream data prediction and user-generated text documents from the Internet and so on [1], [2], [3], [4], [5]. How to classify the given data efficiently and effectively is an important problem.

Extreme learning machine (ELM) is becoming popular since it generally requires far less training time than the conventional learning machines [6], [7], [8], [9], [10], [11]. ELM has originally been developed based on Single-hidden Layer Feedforward Neural Networks (SLFN) in [12], [13], [14], where the hidden nodes parameters are chosen randomly. A Survey of ELM and its variants has been given in [1].

ELM has a better classification and prediction performance in many domains. However, it is of weak generalization performance when the original data set is small relative to its feature space. For example, in bioinformatics, a typical microarray data is often of severely limited number of samples, but of several orders of magnitude more features (genes) [15], [16]. According to the traditional learning theory, given n features, the required number of samples m for the reliable classifier learning should be on the scale of O(2n) [17]. However, even the minimum requirement (m = 10n) as a statistical “rule of thumb” is patently impractical for a real microarray dataset [17]. This poses a great challenge to ELM. In this case, selecting a small number of representative features showing distinct profiles in different classes of samples becomes highly necessary.

Feature subset selection is a promising way to deal with the data of small samples but of high dimensionality. Two basic approaches for feature selection have been proposed and studied for machine learning applications, the wrapper and the filter techniques. The wrapper methods use the predictive accuracy of a predetermined learning algorithm to determine the goodness of the selected subsets, and the accuracy of the learning algorithms is usually high. However, the generality of the selected features is limited and the computational complexity is high [18]. The filter methods are independent of learning algorithms, and thus with good generality. Their computational complexity is low, but the accuracy of the learning algorithms is not guaranteed [19], [20], [21]. The filter methods, due to their computational efficiency, are usually a good choice when the number of features is very large. However, the filter methods consider every attribute independently, ignoring the dependence relationship of attributes with each other.

Interestingly, Markov Boundary considers the dependency relationship between attributes [22]. Given a target variable, its Markov Boundary contains a minimal set of variables on which all other variables are conditionally independent of the target variable. Tsamardinos and Aliferis [23] provided the theoretical results that link the concepts of feature relevance in feature selection and Markov Boundary in Bayesian networks. The theoretical result proved that if a probability distribution can be faithfully represented by a Bayesian network, then the Markov Boundary of the class attribute in the Bayesian network is a unique and minimal feature subset for optimal feature selection. Further, some studies [24], [25] demonstrated that Markov Boundary feature selection outperforms most of the state-of-the-art feature selection algorithms. With these theoretical and practical results, the filter methods using Markov Boundary have attracted much attention in recent years [26], [27], [28]. In this paper, we develop a new Markov Boundary discovering algorithm and then compose the final Markov Boundary classifier in an efficient and effective way.

The main contributions of this paper are as follows: (1) We develop a method automatically determining the number of intervals for every feature of continuous value with theoretic guarantee. However, some commonly used methods often require an explicit parameter k to indicate the maximum number of discretized intervals; (2) We propose a method to identify representative feature subsets for each category. Instead of constructing a large Bayesian network using all features, we select those features of high relevance by just finding the Markov Boundary of the object node in an efficient way; (3) We devise an enhanced ELM for classification, namely MB-ELM. Unlike the existing methods, MB-ELM utilizes the selected feature subsets and a genetic algorithm based ensemble method to improve the effectiveness.

The remainder of this paper is organized as follows: Section 2 gives a brief overview of ELM. Section 3 presents the enhanced ELM classification framework, the four major elements of which, i.e. AIC1-based discretization, initial filter, representative features selection and MB-ELM classifier construction, are detailed in Sections 3.1–3.4 respectively. In Section 4, we report extensive experimental results to validate the proposed MB-ELM. Finally, Section 5 conclude this paper.

Section snippets

Related work

This research is related to the work on extreme learning machine and feature selection.

The enhanced ELM classification framework

In this section, we first give a brief introduction to the enhanced ELM classification framework, and then detail every element of this framework in Sections 3.1–3.4, respectively. Given a dataset of n samples, m features/variables and k categories, i.e. S={s1, s2, ...,sn}, F={f1, f2, ..., fm}, C={c1, c2, ..., ck}. Each sample si is denoted by a vector si={xi1, xi2, ..., xim}, where xij is the value of sample si on feature fj. For each sample siS, there is a class label ci associated with si.

Experiments

In this section, we study the performance of the enhanced ELM based on Markov Boundary (MB-ELM for short) by evaluating its efficiency and effectiveness. The algorithms are coded in matlab. All experiments are conducted on a 2.0-GHz HP PC with 1G memory running Window XP.

We provide two kinds of datasets. For the real dataset, we use three real datasets, i.e., leukemia1, leukemia2, Colon and hereditary breast cancer (HBC) [15]. Table 1 summarizes the characteristics of the three datasets: the

Conclusion

Although has been widely used in many popular application domain, ELM has weak generalization performance to deal with the high-dimension small-sample data. In this paper, an enhanced ELM classification framework is proposed to address the problem. The framework consists of four major modules: AIC-based discretization, initial filter, representative features selection and MB-ELM classification. First, all continuous features are automatically discretized into the proper number of intervals with

Acknowledgments

National Natural Science Foundation of China (61272182}, 61100028, 61572117, 61173029, 61370155), New Century Excellent Talents (NCET-11-0085), Key Program of National Natural Science of China (61332014, U1401256), the Ph.D. Programs Foundation of Ministry of Education of China (young teacher) (no. 20110042120034) and the Fundamental Research Funds for the Central Universities under grants (nos. N150404008, N150402002).

Ying Yin was born in 1980. He received his B.E., M.E. and Ph.D. in computer science, from Northeastern University, China, in 2002, 2005 and 2008, respectively. Currently he is an associate professor in the School of Information Science and Engineering, Northeastern University, China. He is a member of IEEE ACM, and a member of CCF. His major research interests include data mining and machine learning.

References (52)

  • S. Salcedo-Sanz et al.

    One-year-ahead energy demand estimation from macroeconomic variables using computational intelligence algorithms

    Energy Convers. Manage.

    (2015)
  • Y. Zhao et al.

    Improving ELM-based microarray data classification by diversified sequence features selection

    Neural Comput. Appl.

    (2016)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Trans. Autom. Control

    (1974)
  • D. Koller et al.

    Toward optimal feature selection

    Proceedings of the Thirteenth International Conference on Machine Learning (ICML ’96)

    (1996)
  • G. Huang et al.

    Extreme learning machines: a survey

    Int. J. Mach. Learn. Cybern.

    (2011)
  • J. Wu et al.

    Positive and negative fuzzy rule system, extreme learning machine and image classification

    Int. J. Mach. Learn. Cybern.

    (2011)
  • B.P. Chacko et al.

    Handwritten character recognition using wavelet energy and extreme learning machine

    Int. J. Mach. Learn. Cybern.

    (2012)
  • W. Shitong et al.

    A fast learning method for feedforward neural networks

    Neurocomputing

    (2015)
  • G.-B. Huang et al.

    Extreme learning machine: Rbf network case

    Control, Automation, Robotics and Vision Conference, 2004. ICARCV 2004 8th

    (2004)
  • S. Ding et al.

    Extreme learning machine: algorithm, theory and applications

    Artif. Intell. Rev.

    (2015)
  • Y. Zhao et al.

    Learning phenotype structure using sequence model

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • F. Yang et al.

    Robust feature selection for microarray data based on multicriterion fusion

    IEEE/ACM Trans. Comput. Biol. Bioinform.

    (2011)
  • S. Wang et al.

    Computational methods of feature selection

    Pattern Anal. Appl.

    (2010)
  • C. Darya et al.

    Evolutionary elm wrapper feature selection for Alzheimer’s disease CAD on anatomical brain MRI

    Neurocomputing

    (2014)
  • J. Teixeira De Souza
    (2004)
  • A. Blum et al.

    Selection of relevant features and examples in machine learning

    Artif. Intell.

    (1997)
  • Cited by (0)

    Ying Yin was born in 1980. He received his B.E., M.E. and Ph.D. in computer science, from Northeastern University, China, in 2002, 2005 and 2008, respectively. Currently he is an associate professor in the School of Information Science and Engineering, Northeastern University, China. He is a member of IEEE ACM, and a member of CCF. His major research interests include data mining and machine learning.

    Yuhai Zhao was born in 1975. He received his B.E., M.E. and Ph.D. in computer science, from Northeastern University, China, in 1999, 2004 and 2007, respectively. Currently he is an associate professor in the School of Information Science and Engineering, Northeastern University, China. He is a member of IEEE ACM, and a member of CCF. His major research interests include data mining and bioinformatics.

    Bin Zhang was born in 1964. He received his B.E. in computer science from Xi‘an Jiaotong University, China, in 1984, M.E. in computer application technology from Northeastern University, China, in 1989 and received his Ph.D. degrees in computer software and theory from Northeastern University, China, in 1997. Currently he is a professor in the School of Information Science and Engineering, Northeastern University, China. His major research interests are data mining, Cloud computing, Big Data Analysis.

    Chenguang Li was born in 1991. He received his B.E. in computer science from Shengyang Institute of Aeronautical Engineering, China, in 2014. Currently he is a M.E. candidate in computer science, Northeastern University, China. His major research interests include data mining and machine learning.

    Song Guo was born in 1993. He received his B.E. in computer science from Northeastern University, China, in 2014. Currently he is a M.E. candidate in computer science, Northeastern University, China. His major research interests include data mining and machine learning.

    View full text