Logistic local hyperplane-Relief: A feature weighting method for classification

doi:10.1016/j.knosys.2019.04.011

Knowledge-Based Systems

Volume 181, 1 October 2019, 104741

https://doi.org/10.1016/j.knosys.2019.04.011 Get rights and content

Highlights

•
This paper proposes LLH-Relief based on LH-Relief and LI-Relief.
•
LLH-Relief uses local learning to find neighbor representations for given samples.
•
LLH-Relief solves a problem with l1-norm to obtain sparse feature weights.
•
Experimental results show the good feature selection ability of LLH-Relief.

Abstract

Relief-based algorithms have been widely used for feature selection because of their low computational cost and high accuracy. However, the available Relief-based algorithms have their limitations. To improve the performance of Relief-based methods further, we propose a novel feature selection algorithm based on the logistic iterative-Relief (LI-Relief) and local hyperplane-Relief (LH-Relief) methods, called logistic local hyperplane-based Relief (LLH-Relief). LLH-Relief uses local learning to find neighbor representations for given samples and learns feature weights by solving the optimization problem with logistic regression and the $ℓ_{1}$ -norm regularization terms. To demonstrate the validity and the effectiveness of LLH-Relief for feature selection in supervised learning, we perform extensive experiments on toy and real-world datasets. Experimental results indicate that LLH-Relief is very promising.

Introduction

In many supervised learning tasks, the given data would be represented by a very large number of features, but only a few of them are relevant and contributive for constructing decision models. Even state-of-the-art classification algorithms cannot overcome the presence of a large number of weakly relevant and redundant features. This is usually attributed to “the curse of dimensionality” [1]. Thus, dimension reduction has been applied to transforming a high-dimensional original space into a low-dimensional feature space for compact and accurate data representation. Feature selection is one of the fundamental techniques for dimension reduction. Not only can feature selection make the subsequent learning more efficient, but feature selection can also increase result comprehensibility. Because feature selection has such merits, it has drawn a large amount of attention for use with high-dimensional data [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15].

Generally, most feature selection algorithms possess their criterion functions to search for informative features. Obviously, a searching strategy is very important and determines the way of selecting informative features. However, the best search strategy for feature selection is to solve a combination optimization problem, which is NP-hard and time-consuming. To alleviate this issue, some feature weighting methods [12], [16], [17], [18], [19], [20] have been proposed. Note that feature weighting algorithms allow feature weights to take real-valued numbers instead of binary ones. Compared to other feature selection methods, feature weighting algorithms can avoid combinatorial searching [21], [22].

Among the existing feature weighting algorithms, the Relief algorithm is famous due to its simplicity and effectiveness [16]. Originally, Relief was considered to iteratively update feature weights according to their discriminative ability between nearest neighbors [16]. Later, Sun showed that Relief can be viewed as a convex optimization problem [18]. Because Relief is only designed for binary classification tasks, Relief-F was presented for multi-class classification tasks [17]. Different from Relief, Relief-F uses multiple nearest neighbors instead of just one nearest neighbor when computing a distance margin. Both Relief and Relief-F do not consider that the nearest neighbors defined in the original space are highly unlike the ones in the weighted space [23]. Thus, Sun et al. proposed an iterative Relief (I-Relief) method [18]. The margin defined in I-Relief is obtained by probability-weighted averaging margins of samples. Therefore, feature weight estimation in I-Relief may be less accurate if data includes too many abnormal irrelevant features. To generate sparse feature weights, Sun et al. presented a logistic I-Relief (LI-Relief) method by introducing an $ℓ_{1}$ -norm regularization term into I-Relief [24]. LI-Relief suffers from the same issue as I-Relief since LI-Relief adopts the same neighbor representation as I-Relief does. To solve the drawback of I-Relief, Cai et al. proposed a locally hyperplane Relief (LH-Relief) algorithm that estimates feature weights from local patterns approximated by a locally linear hyperplane [12]. To accomplish a proper neighborhood representation, Huang et al. proposed a dynamic representation and neighbor sparse reconstruction-based Relief (DRNSR-Relief) algorithm [25], which replaces the $ℓ_{2}$ -norm regularization term in LH-Relief by the $ℓ_{1}$ -norm regularization term to get a sparse neighbor representation for a given sample. However, LH-Relief and DRNSR-Relief cannot implement feature selection in the real sense since feature weights generated by them are not sparse.

To improve the performance of Relief-based methods further, we propose a logistic local hyperplane-based Relief (LLH-Relief) algorithm for feature selection. LLH-Relief, a combination of LI-Relief and LH-Relief, represents neighbors in a way of local learning similar to that of LH-Relief, and learns a sparse feature weighting by minimizing the objective of the logistical regression with the $ℓ_{1}$ -norm similar to LI-Relief, which has a logarithmic sample complexity with respect to the number of features. The contributions of this paper are listed as follows:

•
A novel Relief algorithm for feature weighting is presented by combining LI-Relief and LH-Relief.
•
LLH-Relief adopts a fine neighbor representation as well as LH-Relief, which solves the issue of coarse neighbor representation of LI-Relief.
•
LLH-Relief uses the objective of the logistical regression with the $ℓ_{1}$ -norm as well as LI-Relief, which solves the issue of feature weights without sparsity in LH-Relief.

The remainder of this paper is organized as follows. LLH-Relief is proposed in Section 2. Section 3 briefly states connections of LLH-Relief to the related work. Section 4 presents extensive experimental results and an analysis of the proposed model. Conclusions are provided in Section 5.

Section snippets

Logistic local hyperplane-Relief

In this section, we present a logistic local hyperplane-Relief algorithm. Simply, LLH-Relief contains two main steps, local neighbor presentation and feature weight updating, as shown in Fig. 1. LLH-Relief uses local hyperplane learning to represent neighbors and updates feature weights by solving the optimization problem with logistic regression and the $ℓ_{1}$ -norm regularization terms.

Assume that there is the set of training samples $D = {(x_{i}, y_{i})}_{i = 1}^{n} \subseteq X \times Y$ , where $x_{i} \in X \subseteq R^{m}$ , $y_{i} \in Y = {1, 2, \dots, C}$ is the class

Connection to other Relief-based algorithms

In this section, we discuss the relationship of LLH-Relief to other Relief-based algorithms: Relief, Relief-F, I-Relief, LI-Relief, LH-Relief, and DRNSR-Relief. For the sake of comparison, we list the key points of these Relief-based algorithms in terms of “Margin vector” and “Objective function” in Table 1. The detail description is given in the following subsections.

Experiments

To validate the effectiveness of LLH-Relief, we conduct extensive experiments on spiral [12], [24], [25], UCI [29] and Microarray datasets [30]. We first analyze LLH-Relief on the spiral dataset and then compare LLH-Relief with other Relief-based methods on real-world datasets.

All numerical experiments were performed on a personal computer with a 3.4 GHz Intel Core and 4 G-bytes of memory. This computer runs Windows 7, with Matlab R2013a.

Conclusions

In this paper, we developed LLH-Relief based on LI-Relief and LH-Relief. The optimization problem of LLH-Relief contains a logistic regression term and the $ℓ_{1}$ -norm regularization term. Minimizing the former is to maximize the expected margin, and minimizing the latter can make the feature weight vector sparse. In addition, the margin vector is obtained using local neighbor representation. Extensive experiments were conducted on toy, eight UCI, and eight microarray datasets. On the toy dataset,

Acknowledgments

This study was funded by the National Natural Science Foundation of China (grants number 61373093, 61572339), by the Soochow Scholar Project, by the Six Talent Peak Project of Jiangsu Province of China, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

References (37)

RuizR. et al.
Incremental wrapper-based gene selection from microarray data for cancer classification
Pattern Recognit.
(2006)
JiangL. et al.
Deep feature weighting for naive Bayes and its application to text classification
Eng. Appl. Artif. Intell.
(2016)
ZhangL. et al.
Two feature weighting approaches for naive Bayes text classifiers
Knowl.-Based Syst.
(2016)
ChenX. et al.
A feature group weighting method for subspace clustering of high-dimensional data
Pattern Recognit.
(2012)
ChaiJ. et al.
Maximum margin multiple-instance feature weighting
Pattern Recognit.
(2014)
HuangX. et al.
Feature weight estimation based on dynamic representation and neighbor sparse reconstruction
Pattern Recognit.
(2018)
ZhangL. et al.
On the sparseness of 1-norm support vector machines
Neural Netw.
(2010)
KiraK. et al.
A practical approach to feature selection
KohaviR. et al.
Wrappers for feature subset selection
Artificial Intell.
(1997)
BellmanR.E.
Adaptive Control Processes
(1961)

GuyonI. et al.

Gene selection for cancer classification using support vector machines

Mach. Learn.

(2002)

ZhangL. et al.

Multiple SVM-RFE for multi-class gene selection on DNA microarray data

HuangX. et al.

Feature clustering based support vector machine recursive feature elimination for gene selection

Appl. Intell.

(2018)

Díaz-UriarteR. et al.

Gene selection and classification of microarray data using random forest

BMC Bioinform.

(2006)

DuanK.-B. et al.

Multiple SVM-RFE for gene selection in cancer classification with expression data

IEEE Trans. Nanobioscience

(2005)

TangY. et al.

Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis

IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB)

(2007)

LiuX. et al.

An entropy-based gene selection method for cancer classification using microarray data

BMC Bioinform.

(2005)

GolubT.R. et al.

Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring

science

(1999)

Cited by (11)

Prediction of hydrogen yield from supercritical gasification process of sewage sludge using machine learning and particle swarm hybrid strategy
2024, International Journal of Hydrogen Energy
This study presents an integrated framework of machine learning models (Artificial Neural Network, Ensembled Learning Tree, Support Vector Machine, and Gaussian Process Regression) and particle swarm optimization (PSO) to predict and optimize hydrogen production from SCWG using sewage sludge characteristics and process parameters. According to the results, ELT-PSO is preferred for forecasting hydrogen yield (Coefficient of determination (R²) = 0.997, Root Mean Square Error (RMSE) = 0.093), and it is highly suggested for handling complex variable-target correlation. However, Support Vector Machine (SVM) performed poorly, with R² = 0.83 and RMSE = 2.28. According to SHAP feature importance Temperature, Carbon, Hydrogen, and Pressure are among the parameters that have strong impact. In addition, by adjusting the ML hyperparameters, optimization method (PSO) was used to maximize H2 yield. The optimized ELT-PSO model was utilized to establish a Graphical User Interface that made it simple to calculate H2 yield.
Feature extractions from transfer characteristics of hybrid GO FETs for selective detection of volatile organic compounds
2023, Measurement: Journal of the International Measurement Confederation
The current work concerns a new strategy for selective classifications of multiple volatile organic compounds (VOCs) by using an array of field effect transistor (FET) type sensors and extraction of device-specific features from transfer characteristics of the FET-sensors to use in classification algorithm(s). A few layered graphene oxide (GO) was used as the base channel materials which was then functionalized very precisely with different metal oxides (WO₃, TiO₂) and metals (Au, Pd) nanoforms to construct an array of five FET structure sensors on SiO₂/Si platform. The deviation in I_d-V_gs characteristics in different VOCs were accounted with a distinctive features. The features were organized based on their importance score and used to construct different feature matrices. A minimum four sensors and six features were required to successful classification of seven VOCs by linear discriminant analysis (LDA). The classification accuracy was tested with supportive vector machine (SVM) classifier and it was 92.86%.
Improved binary particle swarm optimization for feature selection with new initialization and search space reduction strategies
2021, Applied Soft Computing
Citation Excerpt :
A filter approach evaluates the features based on the intrinsic properties of the data [5]. Feature weighting measures based on the information theory (e.g., symmetrical uncertainty [6], mutual information (MI) [7], and dispersion entropy [8]), distance measures (e.g., Relief [9] and ReliefF [10]), etc., are generally used to build filters. A wrapper approach evaluates the importance of a feature subset based on the classification performance of a learning algorithm [11].
Feature selection (FS) is an important preprocessing technique for dimensionality reduction in classification problems. Particle swarm optimization (PSO) algorithms have been widely used as the optimizers for FS problems. However, with the increase of data dimensionality, the search space expands dramatically, which proposes significant challenges for optimization methods, including PSO. In this paper, we propose an improved sticky binary PSO (ISBPSO) algorithm for FS. ISBPSO adopts three new mechanisms based on a recently proposed binary PSO variant, sticky binary particle swarm optimization (SBPSO), to improve the evolutionary performance. First, a new initialization strategy using the feature weighting information based on mutual information is proposed. Second, a dynamic bits masking strategy for gradually reducing the search space during the evolutionary process is proposed. Third, based on the framework of memetic algorithms, a refinement procedure conducting genetic operations on the personal best positions of ISBPSO is used to alleviate the premature convergence problem. The results on 12 UCI datasets show that ISBPSO outperforms six benchmark PSO-based FS methods and two conventional FS methods (sequential forward selection and sequential backward selection) — ISBPSO obtains either higher or similar accuracies with fewer features in most cases. Moreover, ISBPSO substantially reduces the computation time compared with benchmark PSO-based FS methods. Further analysis shows that all the three proposed mechanisms are effective for improving the search performance of ISBPSO.
Mean based relief: An improved feature selection method based on ReliefF
2023, Applied Intelligence
Hybrid machine learning model for prediction of vertical deflection of composite bridges
2023, Proceedings of the Institution of Civil Engineers: Bridge Engineering
A novel filter feature selection algorithm based on relief
2022, Applied Intelligence

View all citing articles on Scopus

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.04.011.

View full text

Logistic local hyperplane-Relief: A feature weighting method for classification☆

Highlights

Abstract

Introduction

Section snippets

Logistic local hyperplane-Relief

Connection to other Relief-based algorithms

Experiments

Conclusions

Acknowledgments

Pattern Recognit.

Eng. Appl. Artif. Intell.

Knowl.-Based Syst.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neural Netw.

Artificial Intell.

Adaptive Control Processes

Gene selection for cancer classification using support vector machines

Mach. Learn.

Multiple SVM-RFE for multi-class gene selection on DNA microarray data

Feature clustering based support vector machine recursive feature elimination for gene selection

Appl. Intell.

Gene selection and classification of microarray data using random forest

BMC Bioinform.

Multiple SVM-RFE for gene selection in cancer classification with expression data

IEEE Trans. Nanobioscience

Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis

IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB)

An entropy-based gene selection method for cancer classification using microarray data

BMC Bioinform.

Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring

science