Elsevier

Knowledge-Based Systems

Volume 181, 1 October 2019, 104741
Knowledge-Based Systems

Logistic local hyperplane-Relief: A feature weighting method for classification

https://doi.org/10.1016/j.knosys.2019.04.011Get rights and content

Highlights

  • This paper proposes LLH-Relief based on LH-Relief and LI-Relief.

  • LLH-Relief uses local learning to find neighbor representations for given samples.

  • LLH-Relief solves a problem with l1-norm to obtain sparse feature weights.

  • Experimental results show the good feature selection ability of LLH-Relief.

Abstract

Relief-based algorithms have been widely used for feature selection because of their low computational cost and high accuracy. However, the available Relief-based algorithms have their limitations. To improve the performance of Relief-based methods further, we propose a novel feature selection algorithm based on the logistic iterative-Relief (LI-Relief) and local hyperplane-Relief (LH-Relief) methods, called logistic local hyperplane-based Relief (LLH-Relief). LLH-Relief uses local learning to find neighbor representations for given samples and learns feature weights by solving the optimization problem with logistic regression and the 1-norm regularization terms. To demonstrate the validity and the effectiveness of LLH-Relief for feature selection in supervised learning, we perform extensive experiments on toy and real-world datasets. Experimental results indicate that LLH-Relief is very promising.

Introduction

In many supervised learning tasks, the given data would be represented by a very large number of features, but only a few of them are relevant and contributive for constructing decision models. Even state-of-the-art classification algorithms cannot overcome the presence of a large number of weakly relevant and redundant features. This is usually attributed to “the curse of dimensionality” [1]. Thus, dimension reduction has been applied to transforming a high-dimensional original space into a low-dimensional feature space for compact and accurate data representation. Feature selection is one of the fundamental techniques for dimension reduction. Not only can feature selection make the subsequent learning more efficient, but feature selection can also increase result comprehensibility. Because feature selection has such merits, it has drawn a large amount of attention for use with high-dimensional data [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15].

Generally, most feature selection algorithms possess their criterion functions to search for informative features. Obviously, a searching strategy is very important and determines the way of selecting informative features. However, the best search strategy for feature selection is to solve a combination optimization problem, which is NP-hard and time-consuming. To alleviate this issue, some feature weighting methods [12], [16], [17], [18], [19], [20] have been proposed. Note that feature weighting algorithms allow feature weights to take real-valued numbers instead of binary ones. Compared to other feature selection methods, feature weighting algorithms can avoid combinatorial searching [21], [22].

Among the existing feature weighting algorithms, the Relief algorithm is famous due to its simplicity and effectiveness [16]. Originally, Relief was considered to iteratively update feature weights according to their discriminative ability between nearest neighbors [16]. Later, Sun showed that Relief can be viewed as a convex optimization problem [18]. Because Relief is only designed for binary classification tasks, Relief-F was presented for multi-class classification tasks [17]. Different from Relief, Relief-F uses multiple nearest neighbors instead of just one nearest neighbor when computing a distance margin. Both Relief and Relief-F do not consider that the nearest neighbors defined in the original space are highly unlike the ones in the weighted space [23]. Thus, Sun et al. proposed an iterative Relief (I-Relief) method [18]. The margin defined in I-Relief is obtained by probability-weighted averaging margins of samples. Therefore, feature weight estimation in I-Relief may be less accurate if data includes too many abnormal irrelevant features. To generate sparse feature weights, Sun et al. presented a logistic I-Relief (LI-Relief) method by introducing an 1-norm regularization term into I-Relief [24]. LI-Relief suffers from the same issue as I-Relief since LI-Relief adopts the same neighbor representation as I-Relief does. To solve the drawback of I-Relief, Cai et al. proposed a locally hyperplane Relief (LH-Relief) algorithm that estimates feature weights from local patterns approximated by a locally linear hyperplane [12]. To accomplish a proper neighborhood representation, Huang et al. proposed a dynamic representation and neighbor sparse reconstruction-based Relief (DRNSR-Relief) algorithm [25], which replaces the 2-norm regularization term in LH-Relief by the 1-norm regularization term to get a sparse neighbor representation for a given sample. However, LH-Relief and DRNSR-Relief cannot implement feature selection in the real sense since feature weights generated by them are not sparse.

To improve the performance of Relief-based methods further, we propose a logistic local hyperplane-based Relief (LLH-Relief) algorithm for feature selection. LLH-Relief, a combination of LI-Relief and LH-Relief, represents neighbors in a way of local learning similar to that of LH-Relief, and learns a sparse feature weighting by minimizing the objective of the logistical regression with the 1-norm similar to LI-Relief, which has a logarithmic sample complexity with respect to the number of features. The contributions of this paper are listed as follows:

  • A novel Relief algorithm for feature weighting is presented by combining LI-Relief and LH-Relief.

  • LLH-Relief adopts a fine neighbor representation as well as LH-Relief, which solves the issue of coarse neighbor representation of LI-Relief.

  • LLH-Relief uses the objective of the logistical regression with the 1-norm as well as LI-Relief, which solves the issue of feature weights without sparsity in LH-Relief.

The remainder of this paper is organized as follows. LLH-Relief is proposed in Section 2. Section 3 briefly states connections of LLH-Relief to the related work. Section 4 presents extensive experimental results and an analysis of the proposed model. Conclusions are provided in Section 5.

Section snippets

Logistic local hyperplane-Relief

In this section, we present a logistic local hyperplane-Relief algorithm. Simply, LLH-Relief contains two main steps, local neighbor presentation and feature weight updating, as shown in Fig. 1. LLH-Relief uses local hyperplane learning to represent neighbors and updates feature weights by solving the optimization problem with logistic regression and the 1-norm regularization terms.

Assume that there is the set of training samples D={(xi,yi)}i=1nX×Y, where xiXRm, yiY={1,2,,C} is the class

Connection to other Relief-based algorithms

In this section, we discuss the relationship of LLH-Relief to other Relief-based algorithms: Relief, Relief-F, I-Relief, LI-Relief, LH-Relief, and DRNSR-Relief. For the sake of comparison, we list the key points of these Relief-based algorithms in terms of “Margin vector” and “Objective function” in Table 1. The detail description is given in the following subsections.

Experiments

To validate the effectiveness of LLH-Relief, we conduct extensive experiments on spiral [12], [24], [25], UCI [29] and Microarray datasets [30]. We first analyze LLH-Relief on the spiral dataset and then compare LLH-Relief with other Relief-based methods on real-world datasets.

All numerical experiments were performed on a personal computer with a 3.4 GHz Intel Core and 4 G-bytes of memory. This computer runs Windows 7, with Matlab R2013a.

Conclusions

In this paper, we developed LLH-Relief based on LI-Relief and LH-Relief. The optimization problem of LLH-Relief contains a logistic regression term and the 1-norm regularization term. Minimizing the former is to maximize the expected margin, and minimizing the latter can make the feature weight vector sparse. In addition, the margin vector is obtained using local neighbor representation. Extensive experiments were conducted on toy, eight UCI, and eight microarray datasets. On the toy dataset,

Acknowledgments

This study was funded by the National Natural Science Foundation of China (grants number 61373093, 61572339), by the Soochow Scholar Project, by the Six Talent Peak Project of Jiangsu Province of China, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

References (37)

  • GuyonI. et al.

    Gene selection for cancer classification using support vector machines

    Mach. Learn.

    (2002)
  • ZhangL. et al.

    Multiple SVM-RFE for multi-class gene selection on DNA microarray data

  • HuangX. et al.

    Feature clustering based support vector machine recursive feature elimination for gene selection

    Appl. Intell.

    (2018)
  • Díaz-UriarteR. et al.

    Gene selection and classification of microarray data using random forest

    BMC Bioinform.

    (2006)
  • DuanK.-B. et al.

    Multiple SVM-RFE for gene selection in cancer classification with expression data

    IEEE Trans. Nanobioscience

    (2005)
  • TangY. et al.

    Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis

    IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB)

    (2007)
  • LiuX. et al.

    An entropy-based gene selection method for cancer classification using microarray data

    BMC Bioinform.

    (2005)
  • GolubT.R. et al.

    Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring

    science

    (1999)
  • Cited by (11)

    • Improved binary particle swarm optimization for feature selection with new initialization and search space reduction strategies

      2021, Applied Soft Computing
      Citation Excerpt :

      A filter approach evaluates the features based on the intrinsic properties of the data [5]. Feature weighting measures based on the information theory (e.g., symmetrical uncertainty [6], mutual information (MI) [7], and dispersion entropy [8]), distance measures (e.g., Relief [9] and ReliefF [10]), etc., are generally used to build filters. A wrapper approach evaluates the importance of a feature subset based on the classification performance of a learning algorithm [11].

    • Hybrid machine learning model for prediction of vertical deflection of composite bridges

      2023, Proceedings of the Institution of Civil Engineers: Bridge Engineering
    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.04.011.

    View full text