Local adaptive learning for semi-supervised feature selection with group sparsity

doi:10.1016/j.knosys.2019.05.030

Knowledge-Based Systems

Volume 181, 1 October 2019, 104787

https://doi.org/10.1016/j.knosys.2019.05.030 Get rights and content

Abstract

Feature selection is often an important tool for many machine learning and data mining tasks. By largely removing the irrelevant features and reducing the complexity of the data processing, feature selection can significantly improve the performance of subsequent classification or clustering tasks. As a result of the rapid development of social networking, large amounts of high-dimensional data have been generated. Due to the high cost of collecting sufficient labels, graph-based semi-supervised feature selection algorithms have attracted the most research interest; however, these approaches neglect the local sparsity of data. Accordingly, motivated by the merits of adaptive learning and sparse learning, we propose a novel feature selection method with a local adaptive loss function and a global sparsity constraint in this paper. Our method can operate more flexibly to model data with different distributions. Moreover, when both the local and global sparsity of data is considered, our method is more capable of selecting the most discriminating features. Experimental results on various real-world applications demonstrate the effectiveness of the proposed feature selection method compared to several state-of-the-art methods.

Introduction

In recent years, issues surrounding high-dimensional data have been frequently confronted in many practical applications due to the rapid development of computer science and network technology [1], [2]. It is well-known that high-dimensional data often contain noise and redundant features, and thus pose a great challenge to existing machine learning methods [3]. Previous works have demonstrated that Feature Selection (FS) is a helpful tool to assist in locating the most representative feature sub, and that thus aids in boosting the performance of subsequent learning tasks [4], [5], [6], [7].

Existing FS approaches can be roughly categorized into three groups according to label availability: namely, supervised, unsupervised and semi-supervised approaches [8]. Supervised FS can result in the most discriminative power provided that enough labels are available; however, in real-world applications, collecting sufficient label information requires a huge amount of human labor resources, especially for large-scale datasets. Conversely, unsupervised FS can function independently of labels and is thus more widely applicable than supervised FS. Generally speaking, unsupervised FS usually assumes that data points can be represented by means of some high-level structure, such as scatter power or adjacency power, and selects features accordingly. Nevertheless, despite its applicability, this approach lacks discriminative power due to its lack of label information. Semi-supervised FS can therefore be considered as a compromise between supervised and unsupervised FS: even when given few labels, it could transfer a priori knowledge of these labeled samples to the unlabeled ones by revealing the geometrical or discriminative structure within the data.

Among the semi-supervised FS approaches recently published, the Graph-based Learning (GL) approaches have attracted the most research interest [9], [10]. Assuming that a given data point will have close relations with the others nearby, GL approaches first build a similarity matrix to encode the adjacent structures among data, and then explore some high-level semantic patterns through graph learning analysis methods, such as ratio cut and normalized cut. For example, Nie et al. [11] proposed a graph-based manifold embedding algorithm, which is flexible and can determine the optimal subspace despite limited label availability. Ma et al. [12] integrated group sparsity learning and graph learning into one single framework to boost feature selection performance. Considering data locality, Yang et al. [13] encoded the manifold structure by means of a local and global regression model. Similarly, Yan et al. [14] imposed a group sparse model to select discriminative features by considering the global and local structures. However, one main drawback of traditional GL is that its fitness function is simply built on the $l_{2}$ -norm, making it sensitive to outliers and prone to generating a large number of redundant connections.

To solve this problem, some researchers have tried to judge the consistency between the similarity matrix and the predicted label matrix under some sparse constraints, such as $l_{1}$ -norm and low-rank constraints. For instance, to reduce the influence of outliers in the graph learning, Nie et al. [15] iteratively updated the weight value in the similarity matrix, generating a clearer graph than that produced by traditional manifold learning methods. Wang et al. [16] generated an initial graph by traditional spectral learning in an unsupervised scenario, followed by an iterative optimization steps; this method assigns a higher weight value to the high-density regions while also assigns a lower weight value to the low-density regions. These authors also applied the sparse graph learning to multi-label classification with limited labeled data [17]. Although these sparse learning approaches are able to achieve a clear manifold structure, they tend to overly penalize on some high density-regions, (i.e. assign a far higher weight value to these regions). To better model data with different distributions, Ding et al. [18] designed a robust loss function that could smoothly and adaptively interpolate between the $l_{1}$ -norm and $l_{2}$ -norm. For brevity, we refer to this as Adaptive Learning (AL). Compared to a traditional $l_{1}$ -norm or $l_{2}$ -norm loss function, AL is more capable of controlling the balance between model fitness and outlier effects. Based on AL, other researchers have tried to encode the geometric structure or cluster structure dynamically and have consequently achieved substantial successes in various applications [19], [20].

Accordingly, to address the above challenges, and inspired by the recent developments in adaptive learning and graph learning, in this paper we propose a local Adaptive Learning Feature Selection (ALFS) method. The proposed ALFS combines the strengths of adaptive graph learning and feature selection, utilizing the local manifold structure to map the labeled and unlabeled data while simultaneously considering the different distributions among them. Moreover, the proposed method also imposes a global sparse learning, which is used to select the most discriminative features as well as to explore the correlation among them as well. Fig. 1 illustrates the overview of our proposed feature selection method as it pertains to image classification application. All training and testing data are first represented by low-level features, followed by the local structure modeling (local graph construction). Subsequently, the determination of label information and group sparse learning are performed simultaneously to generate an initial classification model. Finally, the obtained classification results are extracted to refine the local connections in local structure, and vice versa.

The main contributions of this paper are as follows:

•
ALFS combines the recent advantages of adaptive learning and local regressive learning into a joint framework.
•
The advantage of graph learning is guaranteed by an iterative local manifold structure updating scheme, which dynamically updates the connections between each data point and its neighbors, resulting in robustness classification performance.
•
We propose an efficient iterative algorithm to solve our objective function.

Section snippets

Related work

Our proposed method has close relations with semi-supervised feature selection and local adaptive manifold learning. In this section, we briefly introduce some recently developed approaches.

The proposed framework

In this section, we propose a novel semi-supervised feature learning framework that can adaptively explore the local manifold structure through group sparsity and label information.

Datasets and feature extraction

In this section, to evaluate our proposed local adaptive learning feature selection method, we conduct a thorough analysis to testify to the effectiveness of the proposed method and apply it to several public available datasets, namely MIML [34], Scene [35], HumanEva [36], UTD_MAD [37], Caltech [38], and HAR [39]. These datasets encompass various kinds of applications, such as natural scene annotation (MIML and SCENE), human action recognition (HumanEva, HAR, and UTD_MAD) and object image

Conclusion

In this paper, we extend the local manifold feature selection with an adaptive loss function to propose a novel, semi-supervised feature selection with group sparsity. To accurately model the data points that lie in different local regions, we impose an adaptive parameter to balance their local smoothness and sparsity. Hence, Our proposed method is suitable for datasets with different distributions. In addition, to select the most representative features, the projection matrix is constrained by

Acknowledgments

This paper was supported in part by National Natural Science Foundation of China (Grant No. 61871464), National Natural Science Foundation of Fujian Province, China (Grant Nos. 2016 J01324, 2017 J01511), the “Climbing” Program of Xiamen university of technology, China (Grant No. XPDKQ18012), Scientific Research Fund of Fujian Provincial Education Department (Grant No. JAT170417).

References (43)

SunJ. et al.
Dynamic financial distress prediction with concept drift based on time weighting combined with adaboost support vector machine ensemble
Knowl.-Based Syst.
(2017)
KoprinskaI. et al.
Correlation and instance based feature selection for electricity load forecasting
Knowl.-Based Syst.
(2015)
WANGX.-d. et al.
Unsupervised feature analysis with sparse adaptive learning
Pattern Recognit. Lett.
(2018)
ZhouQ. et al.
Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features
Knowl.-Based Syst.
(2016)
ShangR. et al.
Subspace learning-based graph regularized feature selection
Knowl.-Based Syst.
(2016)
MoradiP. et al.
Integration of graph clustering with ant colony optimization for feature selection
Knowl.-Based Syst.
(2015)
WangS. et al.
Unsupervised feature selection via maximum projection and minimum redundancy
Knowl.-Based Syst.
(2015)
YanY. et al.
Glocal tells you more: Coupling glocal structural for feature selection with sparsity for image and video classification
Comput. Vis. Image Underst.
(2014)
WangX. et al.
Unsupervised spectral feature selection with l1-norm graph
Neurocomputing
(2016)
ZhangC. et al.
Multi-imbalance: An open-source software for multi-class imbalance learning
Knowl.-Based Syst.
(2019)

ZhangY. et al.

A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE

Knowl.-Based Syst.

(2019)

ZhangZ. et al.

Joint hypergraph learning and sparse regression for feature selection

Pattern Recognit.

(2017)

BoutellM.R. et al.

Learning multi-label scene classification

Pattern Recognit.

(2004)

SheikhpourR. et al.

Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems

Inform. Sci.

(2018)

ShiC. et al.

Sparse feature selection based on graph Laplacian for web image annotation

Image Vis. Comput.

(2014)

ZhaoM. et al.

Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction

Inform. Sci.

(2015)

BaiX. et al.

Object classification via feature fusion based marginalized kernels

IEEE Geosci. Remote Sens. Lett.

(2015)

YanC. et al.

Band weighting via maximizing interclass distance for hyperspectral image classification

IEEE Geosci. Remote Sens. Lett.

(2016)

ZhangH. et al.

Object detection via structural feature selection and shape model

IEEE Trans. Image Process.

(2013)

NieF. et al.

Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction

IEEE Trans. Image Process.

(2010)

MaZ. et al.

Discriminating joint feature analysis for multimedia data understanding

IEEE Trans. Multimed.

(2012)

Cited by (16)

Segmentary group-sparsity self-representation learning and spectral clustering via double L<inf>21</inf> norm
2024, Knowledge-Based Systems
With the rapid expansion of data dimensions, subspace representation learning, a method for mapping high-dimensional data samples to their corresponding underlying low-dimensional subspaces, has become an essential process for high-dimensional data clustering. Although the existing methods have achieved reliable data representation learning and precise clustering, few of them realized that the corrupted data points in the dataset will influence the linear representation of the others. When there are multiple heavily corrupted data in a dataset, the matrix of the self-representation coefficient would be influenced by these data. Therefore, this paper proposes the segmentary group-sparsity self-representation learning (SGSSL) and segmentary group-sparsity-based spectral clustering (SGSSC) models to eliminate their influence on representation learning and clustering results. We proposed that imposing varying degrees of row sparsity and column sparsity constraints on the representation coefficient matrix can prevent corrupted data from contaminating other data during the self-representation process, thus obtaining better spectral clustering results. Extensive experiments on several real datasets demonstrate that our proposed method can perform better than several related methods in recent years.
Semi-supervised feature selection for partially labeled mixed-type data based on multi-criteria measure approach
2023, International Journal of Approximate Reasoning
Citation Excerpt :
In reality, it is usually expensive to obtain the decision labels of objects, which requires a lot of labor costs. Therefore, in practical applications only a small amount of objects is acquired with its label information and a large number of data without label information [16–21]. As a result, the partially labeled mixed-type data is generated.
In many real applications, the data are always collected from different types and they are subjected to obtain partial labeling information of objects. Such data are referred to as partially labeled mixed-type data. There is currently few work on feature selection approaches for these data. Motivated by this issue, this paper aims at selecting the informative feature subset from partially labeled mixed-type data. At first, to improve the classification performance, an improved label propagation algorithm based on K-nearest neighbor is proposed, which marks the decision labels of unlabeled objects by making use of the information between unlabeled objects and labeled objects. On this basis, a feature multi-criteria measure based on the dependency, information entropy and information granulation is proposed for selecting candidate features. Finally, the corresponding semi-supervised feature selection algorithm is developed to select the feature subset for the partially labeled mixed-type data. Experimental results on UCI data sets demonstrate the effectiveness of the proposed feature selection algorithm and the superiority in terms of the classification accuracy compared with other algorithms.
Two-stage natural scene image classification with noise discovering and label-correlation mining
2023, Knowledge-Based Systems
In natural scene classification, it is common that one natural scene image may belong to multiple categories concurrently; as a result, multi-label learning has become a research hotspot. Despite recent rapid developments in multi-label learning, the increasing amount of high-dimensional data poses great challenges—such as redundant features and high computational costs—to conventional multi-label learning models. Most contemporary strategies for dealing with this issue depend on forcing feature learning into multi-label models. Notably, however, these approaches seldom pay attention to the label correlation and propagation in the feature subspace. To address this issue, we introduce an alternative multi-label feature learning solution that incorporates both labeled and unlabeled information. Unlike existing multi-label learning models , which rely on clean and trustworthy training datasets, we argue that in semi-supervised learning scenarios, the unlabeled data can be easily corrupted by noise or outliers, which causes the model performance to degrade. We next extract the label correlation and propagate the label information to discover the noise or outliers. Subsequently, our model adaptively searches the optimal feature subspace to reduce the influence of redundant features for high-dimensional data. The effectiveness of the proposed model is demonstrated by experimental observations on artificial and real-world datasets.
Adaptive graph learning for semi-supervised feature selection with redundancy minimization
2022, Information Sciences
Graph-based sparse feature selection plays an important role in semi-supervised feature selection. However, traditional graph-based semi-supervised sparse feature selection separates graph construction from feature selection, which may reduce the performance of model because of noises and outliers. Moreover, sparse feature selection selects features based on the learned projection matrix. Therefore, redundant features are always selected by sparse model since similar features often have similar weights, which will weaken the performance of the algorithm. To alleviate the impact of the above problems, in this study, a novel semi-supervised sparse feature selection framework is proposed, in which the quality of the similarity matrix is improved by adaptive graph learning and the negative influence of redundant features is relieved via redundancy minimization regularization. In addition, based on this framework, two specific methods are given and a unified iterative algorithm is proposed to optimize the objective function. The performance of the proposed method is evaluated by comparing it with seven advanced semi-supervised methods in terms of classification accuracy and F1 score. Extensive experiments conducted on public datasets demonstrate that the proposed methods are superior to some advanced methods.
Robust sparse low-rank embedding for image dimension reduction
2021, Applied Soft Computing
Many methods based on matrix factorization have recently been proposed and achieve good performance in many practical applications. Latent low-rank representation (LatLRR) is a marvelous feature extraction method, and it has shown a powerful ability in extracting robust data features. However, LatLRR and the variants of LRR have some shortcomings as follows: (1) The label information of the original data are not considered, and they are usually unsupervised learning methods. (2) The local structure information is not preserved in the projected space. (3) The dimension of projection space is not reduced, and the extracted features do not have good and distinct interpretability. In order to solve the above problems, a new dimensionality reduction method based on low-rank representation termed robust sparse low-rank embedding (RSLRE) is proposed. Especially, by introducing the $L_{2, 1}$ norm constraint into the projected matrix, RSLRE algorithm can adaptively select the most discriminative and robust data features. In addition, two different matrices are introduced to ensure that projected feature dimensions can be reduced, and the obtained features can simultaneously maintain most of the energy of the observed samples. A large number of experiments on five public image datasets show that the proposed method can achieve very encouraging results compared with some classical feature extraction methods.
Joint image clustering and feature selection with auto-adjoined learning for high-dimensional data
2021, Knowledge-Based Systems
Due to the rapid development of modern multimedia techniques, high-dimensional image data are frequently encountered in many image analysis communities, such as clustering and feature learning. K-means (KM) is one of the widely-used and efficient tools for clustering high-dimensional data. However, as the commonly contained irrelevant features or noise, conventional KMs suffer from degraded performance for high-dimensional data. Recent studies try to overcome this problem by combining KMs with subspace learning. Nevertheless, they usually depend on complex eigenvalue decomposition, which needs expensive computation resources. Besides, their clustering models also ignore the local manifold structure among data, failing to utilize the underlying adjacent information. Two points are critical for clustering high-dimensional image data: efficient feature selecting and clear adjacency exploring. Based on the above consideration, we propose an auto-adjoined subspace clustering. Concretely, to efficiently locate the redundant features, we impose an extremely sparse feature selection matrix into KM, which is easy to be optimized. Besides, to accurately encode the local adjacency among data without the influence of noise, we propose to automatically assign the connectivity of each sample in the low-dimensional feature space. Compared with several state-of-the-art clustering methods, the proposed method constantly improves the clustering performance on six publicly available benchmark image datasets, demonstrating the effectiveness of our method.

View all citing articles on Scopus

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.05.030.

View full text

Local adaptive learning for semi-supervised feature selection with group sparsity☆

Abstract

Introduction

Section snippets

Related work

The proposed framework

Datasets and feature extraction

Conclusion

Acknowledgments

Knowl.-Based Syst.

Knowl.-Based Syst.

Pattern Recognit. Lett.

Knowl.-Based Syst.

Knowl.-Based Syst.

Knowl.-Based Syst.

Knowl.-Based Syst.

Comput. Vis. Image Underst.

Neurocomputing

Knowl.-Based Syst.

Knowl.-Based Syst.

Pattern Recognit.

Pattern Recognit.

Inform. Sci.

Image Vis. Comput.

Inform. Sci.

Object classification via feature fusion based marginalized kernels

IEEE Geosci. Remote Sens. Lett.

Band weighting via maximizing interclass distance for hyperspectral image classification

IEEE Geosci. Remote Sens. Lett.

Object detection via structural feature selection and shape model

IEEE Trans. Image Process.

Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction

IEEE Trans. Image Process.

Discriminating joint feature analysis for multimedia data understanding

IEEE Trans. Multimed.