Elsevier

Information Sciences

Volume 609, September 2022, Pages 465-488
Information Sciences

Adaptive graph learning for semi-supervised feature selection with redundancy minimization

https://doi.org/10.1016/j.ins.2022.07.102Get rights and content

Abstract

Graph-based sparse feature selection plays an important role in semi-supervised feature selection. However, traditional graph-based semi-supervised sparse feature selection separates graph construction from feature selection, which may reduce the performance of model because of noises and outliers. Moreover, sparse feature selection selects features based on the learned projection matrix. Therefore, redundant features are always selected by sparse model since similar features often have similar weights, which will weaken the performance of the algorithm. To alleviate the impact of the above problems, in this study, a novel semi-supervised sparse feature selection framework is proposed, in which the quality of the similarity matrix is improved by adaptive graph learning and the negative influence of redundant features is relieved via redundancy minimization regularization. In addition, based on this framework, two specific methods are given and a unified iterative algorithm is proposed to optimize the objective function. The performance of the proposed method is evaluated by comparing it with seven advanced semi-supervised methods in terms of classification accuracy and F1 score. Extensive experiments conducted on public datasets demonstrate that the proposed methods are superior to some advanced methods.

Introduction

In recent years, because of the rapid development of computers, high-dimensional data has gradually been extensively applied in real-world applications, such as, image annotation [34], target tracking [44] and network image classification [36]. High-dimensional data provides a more detailed description of the objective world, but it inevitably also brings about the problem called “curse of dimensionality”. In order to weaken the effect of the “curse of dimensionality”, feature selection (FS) plays an important role in data dimensionality reduction which improves the performance of learning tasks by selecting relevant features and removing redundant features [12]. FS method can be divided into three categories: supervised, semi-supervised and unsupervised.

As the cost of obtaining data labels is expensive, semi-supervised FS which exploits limited label and a great deal of unlabeled data has attracted the attention of researchers. In the past decades, a vast amount of semi-supervised FS algorithms had been proposed. These FS algorithms can be classified as Filter, Wrapper and Embedded roughly [27]. Filter methods assess features through different evaluation strategies independent of any learning methods [38]. Conversely, the wrapper method selects a feature subset containing the most discriminative information by maximizing the predictive performance of a classifier [2]. In most cases, the wrapper method usually uses metaheuristic algorithms to obtain the optimal feature subset, which makes the wrapper method outperform the filter method [10]. However, wrapper methods require extensive computational resources and are limited by the learning method used in training [8]. Compared to filter and wrapper methods, the embedded method embeds FS into a learning framework and selects features by considering comprehensive properties in data [42]. By contrast, the embedded method outperforms the other methods in efficiency and performance, which makes it more popular than others [22].

Most existing embedded methods are based on graph Laplacian and sparse models, which aim to select features by sparse models and preserve the geometric structure of the data via a Laplacian graph [11]. The sparse model aims to identify and retain important features by penalizing the projection matrix via sparse regularization [41]. Sparse model is efficient so that it is widely used in feature selection. For example, Chen et.al proposed an efficient semi-supervised feature selection based on sparse model (ESFS), in which the latent information in unlabeled data is utilized by a least square loss function rather than Laplacian graph so that this method is suitable for large-scale data [3]. The method based on sparse model can achieve good performance, but Chen et.al argued that the FS based on sparse model not only lacks theoretical explanation, but also cannot find global and sparse solution [5]. Therefore, Chen et.al extended sparse model by a series of scale factors (RLSR), which aim to rescale the regression coefficient. Moreover, Chen et.al proposed another semi-supervised FS model (SRLSR) based on RLSR to obtain sparser solutions via an implicit l2,p norm [6]. The above method based on sparse model is efficient, but there are some flaws which may impair the performance of FS. For example, the sparse model only considers the label information, but some important potential information such as manifold structure is ignored, which may weaken the performance of model [32]. Moreover, as the weight between similar features is similar, the feature subset selected by sparse model may contain redundant features, which may influence the performance [23].

In order to improve the performance of model, many works taken local geometry structure into account and introduced manifold learning into the sparse learning framework, for example, Tang et.al proposed a robust feature selection method, which aims to reduce the sensitivity to outliers by adopting an l1-norm based graph Laplacian regularization term to preserve the local geometric structure of data [33]. Generally, the model based on manifold learning and sparse model contain two individual steps, namely constructing similarity matrix and feature selection [46]. However, Yang et.al indicated that the similarity matrix constructed based on k-nearest neighbors may fall into local optimality because of the influence of noise and outliers [40]. In order to alleviate the negative impact brought by the similarity matrix, learning a graph with adaptive neighbors is more desirable. Chen et.al improved the quality of the similarity matrix by projection distance [7]. Similarly, Zeng et.al constructed a similarity matrix dynamically by introducing an adaptive loss term [43]. The quality of similarity matrix constructed by dynamical way is improved but the efficiency is too low. In order to improve the efficiency of model, Nie et al. proposed a more compact framework which integrates graph construction and learning tasks into a unified framework [25]. Adaptive graph learning greatly improves the performance and efficiency of FS models. However, the above mentioned methods based on adaptive graph learning and sparse model ignore the effect of redundancy features, which may lower the performance of feature selection [26]. Moreover, Liu et.al also indicated that most methods ignore the high redundant features so that the discrimination of the selected features might be decreased [17].

Inspired by the above two shortcomings of conventional graph-based sparse FS, a semi-supervised FS framework called AGLRM which considers the drawbacks of both graph learning and sparse learning is proposed. In order to alleviate the impact of similarity matrix on model performance, adaptive graph learning is adopted to obtain an adaptive similarity matrix instead of pre-defined. Moreover, the redundancy minimization regularization is introduced into sparse model to relief the effect of redundant features. Hence, the proposed AGLRM integrates redundancy minimization regularization and adaptive graph learning into a unified sparse learning framework to adequately consider minimum redundancy and local structure preservation. In other words, the proposed framework AGLRM aims to select a feature subset with the most discriminative and the least redundant. The framework of the proposed AGLRM is shown in Fig. 1.

According to Fig. 1, it is not difficult to see that the graph of the proposed framework is constructed in low-dimensional space instead of original space. Secondly, a penalty mechanism is introduced into our framework to reduce the redundancy between features. Last but not least, the proposed framework will feed back the learned information to the projection matrix, which facilitates the learning of the optimal projection matrix.

The main contributions of our paper are highlighted as follows.

  • i. This paper proposes a novel semi-supervised FS framework, which comprehensively considers the impact of redundant features and noise. Compared with conventional sparse FS based on graph, the quality of the similarity matrix constructed by this method is better since the learned information is used. Besides, the feature subset selected by the proposed method has lower redundancy.

  • ii. The redundancy minimization regularization based on Pearson correlation coefficient is improved and two specific methods are proposed based on the improved redundancy minimization regularization and cosine similarity.

  • iii. The objective functions of the proposed model are optimized by a unified iterative algorithm. Besides, its convergence is testified theoretically and empirically.

  • vi. The comprehensive experiments on public datasets verify the effectiveness of the proposed method and demonstrate its advantages over several state-of-the-art methods.

The rest of this paper is organized as follows. A brief review of the related work is given in Section 2. The proposed method and its optimization method are introduced in Section 3. Then, Section 4 describes the experimental settings and analyzes the experimental results. Finally, a summary of this work is given in Section 5.

Section snippets

Related work

Firstly, the notations and definitions used in this paper are summarized in this section. Then, the related work on FS based on sparse learning, adaptive graph learning and redundancy minimization regularization will be reviewed briefly.

The proposed method

In this section, a novel semi-supervised FS framework which takes feature redundancy and noisy into account is proposed. By adaptive graph learning and redundancy minimization, the negative effects of redundant features and noisy are eliminated as much as possible.

Experiments and evaluation

In this section, extensive experiments are designed and conducted on twelve commonly used datasets at first. Then, the classification accuracy and F1 score on the classifiers KNN, rbf-SVM, CART decision tree and BP neural network with one hidden layer are used to evaluate these algorithms. Finally, in comparison with the current popular algorithms, the high efficiency of AGLRM algorithm is verified. Noting that the experiments are conducted in MATLAB R2021b and the codes are run on a computer

Conclusion

Inspired by the effectiveness of adaptive graph learning, a novel semi-supervised FS algorithm is proposed. It takes both the optimal local structure and feature redundancy into account. In this study, adaptive graph learning and redundancy minimization regularization are integrated into a sparse framework. In comparison with other related approaches, the proposed AGLRM updates the similarity matrix adaptively by adaptive graph learning and finally outputs the optimal local structure graph.

CRediT authorship contribution statement

Jingliu Lai: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing - original draft, Writing - review & editing. Hongmei Chen: Writing - review & editing, Supervision, Visualization, Investigation, Formal analysis, Validation. Tianrui Li: Writing - review & editing, Supervision, Resources. Xiaoling Yang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work is supported by the National Natural Science Foundation of China (Nos. 61976182, 62076171, 61876157, 61976245), Sichuan Key R&D project (2020YFG0035), and Key program for International S&T Cooperation of Sichuan Province (2019YFH0097).

References (47)

  • Razieh Sheikhpour et al.

    A survey on semi-supervised feature selection methods

    Pattern Recognition

    (2017)
  • Razieh Sheikhpour et al.

    A robust graph-based semi-supervised sparse feature selection method

    Information Sciences

    (2020)
  • Razieh Sheikhpour et al.

    Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems

    Information Sciences

    (2018)
  • Caijuan Shi et al.

    Semi-supervised feature selection analysis with structured multi-view sparse regularization

    Neurocomputing

    (2019)
  • Caijuan Shi et al.

    Sparse feature selection based on graph Laplacian for web image annotation

    Image and Vision Computing

    (2014)
  • Baige Tang et al.

    Local preserving logistic I-Relief for semi-supervised feature selection

    Neurocomputing

    (2020)
  • Chang Tang et al.

    Robust graph regularized unsupervised feature selection

    Expert Systems with Applications

    (2018)
  • Tiberio Uricchio et al.

    Automatic image annotation via label transfer in the semantic space

    Pattern Recognition

    (2017)
  • Zhe Wang et al.

    Discriminative graph convolution networks for hyperspectral image classification

    Displays

    (2021)
  • Zhiqiang Zeng et al.

    Local adaptive learning for semi-supervised feature selection with group sparsity

    Knowledge-Based Systems

    (2019)
  • Haojie Zhao et al.

    Deep mutual learning for visual object tracking

    Pattern Recognition

    (2021)
  • Guo Zhong et al.

    Nonnegative self-representation with a fixed rank constraint for subspace clustering

    Information Sciences

    (2020)
  • Xi Chen et al.

    Efficient semi-supervised feature selection for VHR remote sensing images

  • Cited by (19)

    View all citing articles on Scopus
    View full text