Elsevier

Knowledge-Based Systems

Volume 181, 1 October 2019, 104787
Knowledge-Based Systems

Local adaptive learning for semi-supervised feature selection with group sparsity

https://doi.org/10.1016/j.knosys.2019.05.030Get rights and content

Abstract

Feature selection is often an important tool for many machine learning and data mining tasks. By largely removing the irrelevant features and reducing the complexity of the data processing, feature selection can significantly improve the performance of subsequent classification or clustering tasks. As a result of the rapid development of social networking, large amounts of high-dimensional data have been generated. Due to the high cost of collecting sufficient labels, graph-based semi-supervised feature selection algorithms have attracted the most research interest; however, these approaches neglect the local sparsity of data. Accordingly, motivated by the merits of adaptive learning and sparse learning, we propose a novel feature selection method with a local adaptive loss function and a global sparsity constraint in this paper. Our method can operate more flexibly to model data with different distributions. Moreover, when both the local and global sparsity of data is considered, our method is more capable of selecting the most discriminating features. Experimental results on various real-world applications demonstrate the effectiveness of the proposed feature selection method compared to several state-of-the-art methods.

Introduction

In recent years, issues surrounding high-dimensional data have been frequently confronted in many practical applications due to the rapid development of computer science and network technology [1], [2]. It is well-known that high-dimensional data often contain noise and redundant features, and thus pose a great challenge to existing machine learning methods [3]. Previous works have demonstrated that Feature Selection (FS) is a helpful tool to assist in locating the most representative feature sub, and that thus aids in boosting the performance of subsequent learning tasks [4], [5], [6], [7].

Existing FS approaches can be roughly categorized into three groups according to label availability: namely, supervised, unsupervised and semi-supervised approaches [8]. Supervised FS can result in the most discriminative power provided that enough labels are available; however, in real-world applications, collecting sufficient label information requires a huge amount of human labor resources, especially for large-scale datasets. Conversely, unsupervised FS can function independently of labels and is thus more widely applicable than supervised FS. Generally speaking, unsupervised FS usually assumes that data points can be represented by means of some high-level structure, such as scatter power or adjacency power, and selects features accordingly. Nevertheless, despite its applicability, this approach lacks discriminative power due to its lack of label information. Semi-supervised FS can therefore be considered as a compromise between supervised and unsupervised FS: even when given few labels, it could transfer a priori knowledge of these labeled samples to the unlabeled ones by revealing the geometrical or discriminative structure within the data.

Among the semi-supervised FS approaches recently published, the Graph-based Learning (GL) approaches have attracted the most research interest [9], [10]. Assuming that a given data point will have close relations with the others nearby, GL approaches first build a similarity matrix to encode the adjacent structures among data, and then explore some high-level semantic patterns through graph learning analysis methods, such as ratio cut and normalized cut. For example, Nie et al. [11] proposed a graph-based manifold embedding algorithm, which is flexible and can determine the optimal subspace despite limited label availability. Ma et al. [12] integrated group sparsity learning and graph learning into one single framework to boost feature selection performance. Considering data locality, Yang et al. [13] encoded the manifold structure by means of a local and global regression model. Similarly, Yan et al. [14] imposed a group sparse model to select discriminative features by considering the global and local structures. However, one main drawback of traditional GL is that its fitness function is simply built on the l2-norm, making it sensitive to outliers and prone to generating a large number of redundant connections.

To solve this problem, some researchers have tried to judge the consistency between the similarity matrix and the predicted label matrix under some sparse constraints, such as l1-norm and low-rank constraints. For instance, to reduce the influence of outliers in the graph learning, Nie et al. [15] iteratively updated the weight value in the similarity matrix, generating a clearer graph than that produced by traditional manifold learning methods. Wang et al. [16] generated an initial graph by traditional spectral learning in an unsupervised scenario, followed by an iterative optimization steps; this method assigns a higher weight value to the high-density regions while also assigns a lower weight value to the low-density regions. These authors also applied the sparse graph learning to multi-label classification with limited labeled data [17]. Although these sparse learning approaches are able to achieve a clear manifold structure, they tend to overly penalize on some high density-regions, (i.e. assign a far higher weight value to these regions). To better model data with different distributions, Ding et al. [18] designed a robust loss function that could smoothly and adaptively interpolate between the l1-norm and l2-norm. For brevity, we refer to this as Adaptive Learning (AL). Compared to a traditional l1-norm or l2-norm loss function, AL is more capable of controlling the balance between model fitness and outlier effects. Based on AL, other researchers have tried to encode the geometric structure or cluster structure dynamically and have consequently achieved substantial successes in various applications [19], [20].

Accordingly, to address the above challenges, and inspired by the recent developments in adaptive learning and graph learning, in this paper we propose a local Adaptive Learning Feature Selection (ALFS) method. The proposed ALFS combines the strengths of adaptive graph learning and feature selection, utilizing the local manifold structure to map the labeled and unlabeled data while simultaneously considering the different distributions among them. Moreover, the proposed method also imposes a global sparse learning, which is used to select the most discriminative features as well as to explore the correlation among them as well. Fig. 1 illustrates the overview of our proposed feature selection method as it pertains to image classification application. All training and testing data are first represented by low-level features, followed by the local structure modeling (local graph construction). Subsequently, the determination of label information and group sparse learning are performed simultaneously to generate an initial classification model. Finally, the obtained classification results are extracted to refine the local connections in local structure, and vice versa.

The main contributions of this paper are as follows:

  • ALFS combines the recent advantages of adaptive learning and local regressive learning into a joint framework.

  • The advantage of graph learning is guaranteed by an iterative local manifold structure updating scheme, which dynamically updates the connections between each data point and its neighbors, resulting in robustness classification performance.

  • We propose an efficient iterative algorithm to solve our objective function.

Section snippets

Related work

Our proposed method has close relations with semi-supervised feature selection and local adaptive manifold learning. In this section, we briefly introduce some recently developed approaches.

The proposed framework

In this section, we propose a novel semi-supervised feature learning framework that can adaptively explore the local manifold structure through group sparsity and label information.

Datasets and feature extraction

In this section, to evaluate our proposed local adaptive learning feature selection method, we conduct a thorough analysis to testify to the effectiveness of the proposed method and apply it to several public available datasets, namely MIML [34], Scene [35], HumanEva [36], UTD_MAD [37], Caltech [38], and HAR [39]. These datasets encompass various kinds of applications, such as natural scene annotation (MIML and SCENE), human action recognition (HumanEva, HAR, and UTD_MAD) and object image

Conclusion

In this paper, we extend the local manifold feature selection with an adaptive loss function to propose a novel, semi-supervised feature selection with group sparsity. To accurately model the data points that lie in different local regions, we impose an adaptive parameter to balance their local smoothness and sparsity. Hence, Our proposed method is suitable for datasets with different distributions. In addition, to select the most representative features, the projection matrix is constrained by

Acknowledgments

This paper was supported in part by National Natural Science Foundation of China (Grant No. 61871464), National Natural Science Foundation of Fujian Province, China (Grant Nos. 2016 J01324, 2017 J01511), the “Climbing” Program of Xiamen university of technology, China (Grant No. XPDKQ18012), Scientific Research Fund of Fujian Provincial Education Department (Grant No. JAT170417).

References (43)

  • ZhangY. et al.

    A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE

    Knowl.-Based Syst.

    (2019)
  • ZhangZ. et al.

    Joint hypergraph learning and sparse regression for feature selection

    Pattern Recognit.

    (2017)
  • BoutellM.R. et al.

    Learning multi-label scene classification

    Pattern Recognit.

    (2004)
  • SheikhpourR. et al.

    Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems

    Inform. Sci.

    (2018)
  • ShiC. et al.

    Sparse feature selection based on graph Laplacian for web image annotation

    Image Vis. Comput.

    (2014)
  • ZhaoM. et al.

    Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction

    Inform. Sci.

    (2015)
  • BaiX. et al.

    Object classification via feature fusion based marginalized kernels

    IEEE Geosci. Remote Sens. Lett.

    (2015)
  • YanC. et al.

    Band weighting via maximizing interclass distance for hyperspectral image classification

    IEEE Geosci. Remote Sens. Lett.

    (2016)
  • ZhangH. et al.

    Object detection via structural feature selection and shape model

    IEEE Trans. Image Process.

    (2013)
  • NieF. et al.

    Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction

    IEEE Trans. Image Process.

    (2010)
  • MaZ. et al.

    Discriminating joint feature analysis for multimedia data understanding

    IEEE Trans. Multimed.

    (2012)
  • Cited by (16)

    • Semi-supervised feature selection for partially labeled mixed-type data based on multi-criteria measure approach

      2023, International Journal of Approximate Reasoning
      Citation Excerpt :

      In reality, it is usually expensive to obtain the decision labels of objects, which requires a lot of labor costs. Therefore, in practical applications only a small amount of objects is acquired with its label information and a large number of data without label information [16–21]. As a result, the partially labeled mixed-type data is generated.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.05.030.

    View full text