Adaptive graph learning for semi-supervised feature selection with redundancy minimization
Introduction
In recent years, because of the rapid development of computers, high-dimensional data has gradually been extensively applied in real-world applications, such as, image annotation [34], target tracking [44] and network image classification [36]. High-dimensional data provides a more detailed description of the objective world, but it inevitably also brings about the problem called “curse of dimensionality”. In order to weaken the effect of the “curse of dimensionality”, feature selection (FS) plays an important role in data dimensionality reduction which improves the performance of learning tasks by selecting relevant features and removing redundant features [12]. FS method can be divided into three categories: supervised, semi-supervised and unsupervised.
As the cost of obtaining data labels is expensive, semi-supervised FS which exploits limited label and a great deal of unlabeled data has attracted the attention of researchers. In the past decades, a vast amount of semi-supervised FS algorithms had been proposed. These FS algorithms can be classified as Filter, Wrapper and Embedded roughly [27]. Filter methods assess features through different evaluation strategies independent of any learning methods [38]. Conversely, the wrapper method selects a feature subset containing the most discriminative information by maximizing the predictive performance of a classifier [2]. In most cases, the wrapper method usually uses metaheuristic algorithms to obtain the optimal feature subset, which makes the wrapper method outperform the filter method [10]. However, wrapper methods require extensive computational resources and are limited by the learning method used in training [8]. Compared to filter and wrapper methods, the embedded method embeds FS into a learning framework and selects features by considering comprehensive properties in data [42]. By contrast, the embedded method outperforms the other methods in efficiency and performance, which makes it more popular than others [22].
Most existing embedded methods are based on graph Laplacian and sparse models, which aim to select features by sparse models and preserve the geometric structure of the data via a Laplacian graph [11]. The sparse model aims to identify and retain important features by penalizing the projection matrix via sparse regularization [41]. Sparse model is efficient so that it is widely used in feature selection. For example, Chen et.al proposed an efficient semi-supervised feature selection based on sparse model (ESFS), in which the latent information in unlabeled data is utilized by a least square loss function rather than Laplacian graph so that this method is suitable for large-scale data [3]. The method based on sparse model can achieve good performance, but Chen et.al argued that the FS based on sparse model not only lacks theoretical explanation, but also cannot find global and sparse solution [5]. Therefore, Chen et.al extended sparse model by a series of scale factors (RLSR), which aim to rescale the regression coefficient. Moreover, Chen et.al proposed another semi-supervised FS model (SRLSR) based on RLSR to obtain sparser solutions via an implicit norm [6]. The above method based on sparse model is efficient, but there are some flaws which may impair the performance of FS. For example, the sparse model only considers the label information, but some important potential information such as manifold structure is ignored, which may weaken the performance of model [32]. Moreover, as the weight between similar features is similar, the feature subset selected by sparse model may contain redundant features, which may influence the performance [23].
In order to improve the performance of model, many works taken local geometry structure into account and introduced manifold learning into the sparse learning framework, for example, Tang et.al proposed a robust feature selection method, which aims to reduce the sensitivity to outliers by adopting an -norm based graph Laplacian regularization term to preserve the local geometric structure of data [33]. Generally, the model based on manifold learning and sparse model contain two individual steps, namely constructing similarity matrix and feature selection [46]. However, Yang et.al indicated that the similarity matrix constructed based on k-nearest neighbors may fall into local optimality because of the influence of noise and outliers [40]. In order to alleviate the negative impact brought by the similarity matrix, learning a graph with adaptive neighbors is more desirable. Chen et.al improved the quality of the similarity matrix by projection distance [7]. Similarly, Zeng et.al constructed a similarity matrix dynamically by introducing an adaptive loss term [43]. The quality of similarity matrix constructed by dynamical way is improved but the efficiency is too low. In order to improve the efficiency of model, Nie et al. proposed a more compact framework which integrates graph construction and learning tasks into a unified framework [25]. Adaptive graph learning greatly improves the performance and efficiency of FS models. However, the above mentioned methods based on adaptive graph learning and sparse model ignore the effect of redundancy features, which may lower the performance of feature selection [26]. Moreover, Liu et.al also indicated that most methods ignore the high redundant features so that the discrimination of the selected features might be decreased [17].
Inspired by the above two shortcomings of conventional graph-based sparse FS, a semi-supervised FS framework called AGLRM which considers the drawbacks of both graph learning and sparse learning is proposed. In order to alleviate the impact of similarity matrix on model performance, adaptive graph learning is adopted to obtain an adaptive similarity matrix instead of pre-defined. Moreover, the redundancy minimization regularization is introduced into sparse model to relief the effect of redundant features. Hence, the proposed AGLRM integrates redundancy minimization regularization and adaptive graph learning into a unified sparse learning framework to adequately consider minimum redundancy and local structure preservation. In other words, the proposed framework AGLRM aims to select a feature subset with the most discriminative and the least redundant. The framework of the proposed AGLRM is shown in Fig. 1.
According to Fig. 1, it is not difficult to see that the graph of the proposed framework is constructed in low-dimensional space instead of original space. Secondly, a penalty mechanism is introduced into our framework to reduce the redundancy between features. Last but not least, the proposed framework will feed back the learned information to the projection matrix, which facilitates the learning of the optimal projection matrix.
The main contributions of our paper are highlighted as follows.
i. This paper proposes a novel semi-supervised FS framework, which comprehensively considers the impact of redundant features and noise. Compared with conventional sparse FS based on graph, the quality of the similarity matrix constructed by this method is better since the learned information is used. Besides, the feature subset selected by the proposed method has lower redundancy.
ii. The redundancy minimization regularization based on Pearson correlation coefficient is improved and two specific methods are proposed based on the improved redundancy minimization regularization and cosine similarity.
iii. The objective functions of the proposed model are optimized by a unified iterative algorithm. Besides, its convergence is testified theoretically and empirically.
vi. The comprehensive experiments on public datasets verify the effectiveness of the proposed method and demonstrate its advantages over several state-of-the-art methods.
The rest of this paper is organized as follows. A brief review of the related work is given in Section 2. The proposed method and its optimization method are introduced in Section 3. Then, Section 4 describes the experimental settings and analyzes the experimental results. Finally, a summary of this work is given in Section 5.
Section snippets
Related work
Firstly, the notations and definitions used in this paper are summarized in this section. Then, the related work on FS based on sparse learning, adaptive graph learning and redundancy minimization regularization will be reviewed briefly.
The proposed method
In this section, a novel semi-supervised FS framework which takes feature redundancy and noisy into account is proposed. By adaptive graph learning and redundancy minimization, the negative effects of redundant features and noisy are eliminated as much as possible.
Experiments and evaluation
In this section, extensive experiments are designed and conducted on twelve commonly used datasets at first. Then, the classification accuracy and F1 score on the classifiers KNN, rbf-SVM, CART decision tree and BP neural network with one hidden layer are used to evaluate these algorithms. Finally, in comparison with the current popular algorithms, the high efficiency of AGLRM algorithm is verified. Noting that the experiments are conducted in MATLAB R2021b and the codes are run on a computer
Conclusion
Inspired by the effectiveness of adaptive graph learning, a novel semi-supervised FS algorithm is proposed. It takes both the optimal local structure and feature redundancy into account. In this study, adaptive graph learning and redundancy minimization regularization are integrated into a sparse framework. In comparison with other related approaches, the proposed AGLRM updates the similarity matrix adaptively by adaptive graph learning and finally outputs the optimal local structure graph.
CRediT authorship contribution statement
Jingliu Lai: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing - original draft, Writing - review & editing. Hongmei Chen: Writing - review & editing, Supervision, Visualization, Investigation, Formal analysis, Validation. Tianrui Li: Writing - review & editing, Supervision, Resources. Xiaoling Yang: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work is supported by the National Natural Science Foundation of China (Nos. 61976182, 62076171, 61876157, 61976245), Sichuan Key R&D project (2020YFG0035), and Key program for International S&T Cooperation of Sichuan Province (2019YFH0097).
References (47)
- et al.
Multi-view feature selection via nonnegative structured graph learning
Neurocomputing
(2020) - et al.
A survey on optimization metaheuristics
Information Sciences
(2013) - et al.
A comprehensive survey on recent metaheuristics for feature selection
Neurocomputing
(2022) - et al.
Dual graph regularized compact feature representation for unsupervised feature selection
Neurocomputing
(2019) - et al.
Penalized partial least square discriminant analysis with l_1)norm for multi-label data
Pattern Recognition
(2015) - et al.
Unsupervised feature selection via diversity-induced self-representation
Neurocomputing
(2017) - et al.
Structured optimal graph based sparse feature extraction for semi-supervised learning
Signal Processing
(2020) - et al.
Discriminative sparse embedding based on adaptive graph for dimension reduction
Engineering Applications of Artificial Intelligence
(2020) - et al.
Semi-supervised multi-label feature selection with adaptive structure learning and manifold learning
Knowledge-Based Systems
(2021) - et al.
Sparse feature selection: relevance, redundancy and locality structure preserving guided by pairwise constraints
Applied Soft Computing
(2020)
A survey on semi-supervised feature selection methods
Pattern Recognition
A robust graph-based semi-supervised sparse feature selection method
Information Sciences
Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems
Information Sciences
Semi-supervised feature selection analysis with structured multi-view sparse regularization
Neurocomputing
Sparse feature selection based on graph Laplacian for web image annotation
Image and Vision Computing
Local preserving logistic I-Relief for semi-supervised feature selection
Neurocomputing
Robust graph regularized unsupervised feature selection
Expert Systems with Applications
Automatic image annotation via label transfer in the semantic space
Pattern Recognition
Discriminative graph convolution networks for hyperspectral image classification
Displays
Local adaptive learning for semi-supervised feature selection with group sparsity
Knowledge-Based Systems
Deep mutual learning for visual object tracking
Pattern Recognition
Nonnegative self-representation with a fixed rank constraint for subspace clustering
Information Sciences
Efficient semi-supervised feature selection for VHR remote sensing images
Cited by (19)
Multi-label Feature selection with adaptive graph learning and label information enhancement
2024, Knowledge-Based SystemsSemi-supervised feature selection based on fuzzy related family
2024, Information SciencesEfficient multi-view semi-supervised feature selection
2023, Information SciencesA semi-supervised adaptive discriminative discretization method improving discrimination power of regularized naive Bayes
2023, Expert Systems with Applications