Unsupervised feature selection via Diversity-induced Self-representation

doi:10.1016/j.neucom.2016.09.043

Neurocomputing

Volume 219, 5 January 2017, Pages 350-363

https://doi.org/10.1016/j.neucom.2016.09.043 Get rights and content

Abstract

Feature selection is to select a subset of relevant features from the original feature set. In practical applications, regarding the unavailability of an amount of the labels is still a challenging problem. To overcome this problem, unsupervised feature selection algorithms have been developed and achieve promising performance. However, most existing approaches consider only the representativeness of features, but the diversity of features which may lead to the high redundancy and the losses of valuable features are ignored. In this paper, we propose a Diversity-induced Self-representation (DISR) based unsupervised feature selection method to effectively select the features with both representativeness and diversity. Specifically, based on the inherent self-representation property of features, the most representative features can be selected. Meanwhile, to preserve the diversity of selected features and reduce the redundancy of the original features as soon as possible, we introduce a novel diversity term, which adjusts the weights of selected features by incorporating the similarities between features. We then present an efficient algorithm to solve the optimization problem by using the inexact Augmented Lagrange Method (ALM). Finally, both clustering and classification tasks are used to evaluate the proposed method. Empirical results on the synthetic dataset and nine real-world datasets demonstrate the superiority of our method compared with state-of-the-art algorithms.

Introduction

The high-dimensional data is ubiquitous in many areas, such as computer vision, pattern recognition and data mining. It not only significantly increases the computation and storage cost, but also induces overfitting and incomprehensible models. To overcome these issues, feature selection techniques have been considered as a type of effective method to reduce the dimensionality by removing irrelevant and redundant features. The aim of feature selection is to obtain a subset of features by removing the noise and redundancy in original features, so that the more intrinsic representation of data and the better performance are achieved [1].

According to the availability of the label information, feature selection approaches can be categorized into supervised methods [2], [3], [4], [5] and unsupervised methods [6], [7], [8], [9], [10], [11]. Compared with supervised methods, unsupervised feature selection methods aim at selecting relevant features without label information. Since the labeling of samples is usually expensive, we cannot always obtain the labels beforehand. Thus, unsupervised feature selection holds great potential in real-world applications.

The early unsupervised feature sePh.D.lection algorithms consider feature ranking techniques as the principle criteria for feature selection [8], [12], [13], [14], [15]. One of the main limitations of these methods is that they treat features independently without considering possible correlations among features. To address this problem, a series of algorithms [10], [16], [17] have been developed. A typical method is spectral clustering based algorithms, which can select a feature subset with preserving the underlying structure between clusters. Spectral clustering based methods explore the cluster structure of data using matrix factorization for spectral analysis, and then select features via sparsity regularization models. Nevertheless, they heavily rely on the learned graph Laplacian. Noises in features may lead to their unreliability. Recently, the self-representation technique has shown significant potential in many tasks, such as subspace clustering [18], [19] and active learning [20], [21]. Motivated by this, some researchers consider the feature selection from the perspective of self-representation property of features [22], i.e., each feature can be well represented by the linear combination of its relevant features. However, they mainly consider selecting the representative features while ignoring the diversity among them. Both representativeness and diversity properties are very important for selecting the effective features: (1) the ideal selected features are supposed to represent the whole original features. That is to say, the highly irrelevant features are discarded, and meanwhile the most relevant features are preserved. (2) The ideal selected features should be diverse enough to capture not only important (representative) but also comprehensive information from features. By considering the diversity property, we can capture more information of data, because features usually describe different aspects of data. (3) The diversity property implies that the very similar features should not be selected simultaneously, so that the redundancy can be greatly reduced. Therefore, there is a great need for integrating both representativeness and diversity properties of features for feature selection.

In this paper, considering both representativeness and diversity properties of features, we propose a novel method, called Diversity-induced Self-representation (DISR) for unsupervised feature selection. Specifically, based on self-representation property, i.e., each feature can be well approximated by the linear combination of its relevant features, the most representative features can be selected. Meanwhile, by incorporating the similarities between features to adjust the weights of being selected, we introduce a diversity term to reduce the redundancy. Then, an efficient optimization algorithm is provided by using the inexact Augmented Lagrange Method (ALM).

Finally, we evaluate our method in both clustering and classification tasks. Experimental results on the synthetic dataset and nine real-world datasets show that the proposed DISR has a better performance than other compared methods.

To summarize, the main contributions of this paper are as follows:

•
A novel Diversity-induced Self-representation (DISR) for unsupervised feature selection algorithm is proposed. The algorithm considers both the representativeness and diversity properties of features, and hence it can select more valuable features.
•
A diversity term is introduced to the method. This term is the measure of diversity of all the features. Thus, the diversity of features is used to guide for feature selection.
•
An iterative algorithm based on inexact ALM is proposed to efficiently solve the optimization model. Experimental results demonstrate that the superiority of our algorithm compared with state-of-the-art algorithms.

The rest of the paper is organized as follows. A brief review of the related works on unsupervised feature selection is given in Section 2. Section 3 introduces the proposed DISR algorithm, and Section 4 describes the optimization of our algorithm in details. Extensive experiments on synthetic and real-world datasets are presented in Section 5. Finally, Section 6 concludes this paper.

Section snippets

Related work

Recently, many unsupervised feature selection methods have been proposed. These methods can be roughly divided into three categories: filter, wrapper, and embedded methods. Because our work belongs to the embedded method, we just briefly review the filter and wrapper methods first, and then review the related works on embedded methods in more details. Filter methods use feature ranking techniques as the principle criteria for feature selection due to their simplicity and practical success

Diversity-induced Self-representation for unsupervised feature selection

In this section, we first introduce the unsupervised feature selection based on regularized self-representation, and then provide our novel unsupervised feature selection method via Diversity-induced Self-representation.

Optimization

So far, the method based on the Diversity-induced Self-representation has been proposed. Obviously, it is hard to find the global optimizers since two non-smooth terms are involved. Thus, we employ the inexact augmented Lagrange method [40] to optimize each variable iteratively.

Experiments

In this section, we demonstrate the effectiveness of the proposed DISR on the synthetic dataset, and then evaluate the performance of DISR in both clustering and classification tasks on the real-world datasets.

Conclusion

In this paper, we propose a novel unsupervised feature selection method via Diversity-induced Self-representation, called DISR, which can select the features with both representativeness and diversity. By incorporating the similarities between features to adjust the weights of being selected, a novel diversity term is designed to eliminate redundancy among selected features. To solve the proposed optimization problem, an efficient optimization algorithm is presented by using ALM method.

Yanbei Liu received the B.E. degree from Zhengzhou University of Light Industry, Zhengzhou, China, in 2009 and the M.E. degree from Tianjin Polytechnic University, Tianjin, China, in 2012. He is currently pursuing the Ph.D. degree from the School of Electronic Information Engineering, Tianjin University, Tianjin, China. His current research interests include machine learning, subspace learning, pattern recognition.

References (51)

F. Wang et al.
An efficient feature selection algorithm for hybrid data
Neurocomputing
(2016)
M. Han et al.
Global mutual information-based feature selection approach using single-objective and multi-objective optimization
Neurocomputing
(2015)
Y. Cong et al.
Udsfs: unsupervised deep sparse feature selection
Neurocomputing
(2016)
Y. Wu et al.
Group sparse feature selection on local learning based clustering
Neurocomputing
(2016)
P. Jing et al.
Visual search reranking with relevant local discriminant analysis
Neurocomputing
(2016)
P. Zhu et al.
Unsupervised feature selection by regularized self-representation
Pattern Recognit.
(2015)
E. Amaldi et al.
On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems
Theor. Comput. Sci.
(1998)
I. Guyon et al.
An introduction to variable and feature selection
J. Mach. Learn. Res.
(2003)
Z. Zhao et al.
On similarity preserving feature selection
IEEE Trans. Knowl. Data Eng.
(2013)
F. Nie, H. Huang, X. Cai, C.H. Ding, Efficient and robust feature selection via joint ℓ2,1-norms minimization, in:...

P. Mitra et al.

Unsupervised feature selection using feature similarity

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

D. Liang, Z. Shen, L. Xuan, Z. Peng, Y.D. Shen, Local and global discriminative learning for unsupervised feature...

X. He, D. Cai, P. Niyogi, Laplacian score for feature selection, in: Advances in Neural Information Processing Systems,...

D. Cai, C. Zhang, X. He, Unsupervised feature selection for multi-cluster data, in: Proceedings of the International...

Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the...

F. Nie, S. Xiang, Y. Jia, C. Zhang, S. Yan, Trace ratio criterion for feature selection, in: Association for the...

P. Jing, Y. Su, C. Xu, L. Zhang, Hyperssr: a hypergraph based semi-supervised ranking method for visual search...

Z. Li, Y. Yang, J. Liu, X. Zhou, H. Lu, Unsupervised feature selection using nonnegative spectral analysis, in:...

Y. Yang, H.T. Shen, Z. Ma, Z. Huang, X. Zhou, ℓ2,1-norm regularized discriminative feature selection for unsupervised...

E. Elhamifar et al.

Sparse subspace clusteringalgorithm, theory, and applications

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

C. Lu, J. Feng, Z. Lin, S. Yan, Correlation adaptive subspace segmentation by trace lasso, in: Proceedings of the...

F. Nie, H. Wang, H. Huang, C.H. Ding, Early active learning via robust representation and structured sparsity, in:...

Y. Hu, D. Zhang, Z. Jin, D. Cai, X. He, Active learning via neighborhood reconstruction, in: Proceedings of the...

W. Krzanowski

Selection of variables to preserve multivariate data structure, using principal components

Appl. Stat.

(1987)

C. Constantinopoulos et al.

Bayesian feature and model selection for gaussian mixture models

IEEE Trans. Pattern Anal. Mach. Intell.

(2006)

Cited by (34)

Unsupervised feature selection via dual space-based low redundancy scores and extended OLSDA
2024, Information Sciences
Spectral clustering is a widely used method for unsupervised feature selection (UFS) to generate pseudo labels. Nonetheless, it is acknowledged that graph algorithms suffer from issues such as redundancy and the dissatisfaction of connectivity, which greatly affect the quality of the learning local manifold. Moreover, existing UFS methods usually ignore the linear dependency among features and the role of non-negative semantics of the predicted labels. Hence, a novel algorithm using dual space-based low redundancy scores and extended orthogonal least square discriminant analysis (OLSDA), abbreviated as DLSEO, is proposed in this paper. Specifically, extended OLSDA is employed to derive non-negative clustering labels and a manifold structure, which avoids explicitly constructing a Laplacian graph. The dual space-based low redundancy scores related to mutual information eliminate redundant data and features, preventing their interference in the feature selection process. Moreover, $ℓ_{2, 1 - 2}$ -norm is introduced to simultaneously guarantee the feature weight matrix's sparsity and ensure the selection of salient features. The results of experiments conducted on twelve benchmark datasets compared with several relevant methods show the superiority of DLSEO.
Unsupervised feature selection algorithm based on redundancy learning and sparse regression
2023, Physica A: Statistical Mechanics and its Applications
In recent years, feature selection methods based on sparse regression have attracted much attention from researchers, and how to select more representative feature is the key point. In this paper, an unsupervised feature selection method based on redundancy learning and sparse regression(RSUFS) is proposed. Firstly, to make the model robust to outliers, this paper uses the $l_{2, 1}$ -norm regression model as the loss function to learn the feature weight matrix. Secondly, in order to get exact $k$ top features, $l_{2, 0}$ -norm constraint is introduced. At the same time, the cosine similarity between features is taken into account to select more valuable features by reducing the redundancy between features. Finally, an efficient algorithm based on Augmented Lagrangian method is derived to solve the above optimization problem. Comparison experiments are made with some benchmark datasets and seven well-known unsupervised feature selection algorithms and the results show that the given algorithm is effective.
Semi-supervised feature selection via adaptive structure learning and constrained graph learning
2022, Knowledge-Based Systems
Graph-based sparse feature selection plays an important role in semi-supervised feature selection, which greatly improves the performance of feature selection. However, most existing semi-supervised methods based on graph are still limited in two main aspects. On the one hand, the quality of the similarity matrix will affect the performance of the learning model. Adaptive graph learning improves the quality of similarity matrix by learning the similarity matrix adaptively. However, most methods based on adaptive graph learning ignore the label information, which may limit the quality of the similarity matrix. On the other hand, many state-of-the-art methods only consider the local structure and neglect the global structure of samples, which will result in high redundancy in the selected features. To alleviate the impact of the above problems, in this study, a novel semi-supervised feature selection model named ASLCGLFS is proposed. In the proposed method, the adaptive graph learning is extended through label information, which aims to further improve the quality of the similarity matrix by utilizing the label information to constrain the graph learning. Moreover, adaptive structure learning is introduced, which not only considers the global structure but also facilitates feature selection. An iteration method is designed to solve the objective function and the convergence of this method is proved theoretically and experimentally. Extensive experiments conducted on common datasets verify that the proposed ASLCGLFS is better than some state-of-the-art feature selection algorithms in performance.
Adaptive graph learning for semi-supervised feature selection with redundancy minimization
2022, Information Sciences
Graph-based sparse feature selection plays an important role in semi-supervised feature selection. However, traditional graph-based semi-supervised sparse feature selection separates graph construction from feature selection, which may reduce the performance of model because of noises and outliers. Moreover, sparse feature selection selects features based on the learned projection matrix. Therefore, redundant features are always selected by sparse model since similar features often have similar weights, which will weaken the performance of the algorithm. To alleviate the impact of the above problems, in this study, a novel semi-supervised sparse feature selection framework is proposed, in which the quality of the similarity matrix is improved by adaptive graph learning and the negative influence of redundant features is relieved via redundancy minimization regularization. In addition, based on this framework, two specific methods are given and a unified iterative algorithm is proposed to optimize the objective function. The performance of the proposed method is evaluated by comparing it with seven advanced semi-supervised methods in terms of classification accuracy and F1 score. Extensive experiments conducted on public datasets demonstrate that the proposed methods are superior to some advanced methods.
Unsupervised feature selection with robust data reconstruction (UFS-RDR) and outlier detection
2022, Expert Systems with Applications
In unsupervised learning, the traditional feature selection methods are not always efficient and their feature selection performance can be severely affected in the presence of outliers and noise. To address this issue, we propose a novel robust unsupervised feature selection method, called Unsupervised Feature Selection with Robust Data Reconstruction (UFS-RDR), that minimizes the graph regularized weighted data reconstruction error function. For the detection of outliers, the well-known Mahalanobis distance is used and further determine the Huber-type weight function using these Mahalanobis distances. This weight function downweights the clustering observations that have large distance. Our experimental results on both synthetic and real-world datasets indicate that the proposed UFS-RDR approach has good feature selection performance and also outperforms the competitive non-robust unsupervised feature selection methods in the presence of contamination in the unlabeled data.
Unsupervised feature selection via self-paced learning and low-redundant regularization
2022, Knowledge-Based Systems
Citation Excerpt :
Given the fact that redundancy between features plays an important role in feature selection, it is necessary to introduce a regularization term to eliminate its negative impact. For this reason, Liu et al. proposed a novel term taking the pairwise similarity of features into consideration [23]. In order to achieve low redundancy between features, we add the pairwise similarity as regularizer to the framework of subspace learning.
Much more attention has been paid to unsupervised feature selection nowadays due to the emergence of massive unlabeled data. The distribution of samples and the latent effect of training a learning method using samples in more effective order need to be considered so as to improve the robustness of the method. Self-paced learning is an effective method considering the training order of samples. In this study, an unsupervised feature selection is proposed by integrating the framework of self-paced learning and subspace learning. Moreover, the local manifold structure is preserved and the redundancy of features is constrained by two regularization terms. $L_{2, 1 / 2}$ -norm is applied to the projection matrix, which aims to retain discriminative features and further alleviate the effect of noise in the data. Then, an iterative method is presented to solve the optimization problem. The convergence of the method is proved theoretically and experimentally. The proposed method is compared with other state of the art algorithms on nine real-world datasets. The experimental results show that the proposed method can improve the performance of clustering methods and outperform other compared algorithms.

View all citing articles on Scopus

Kaihua Liu received the B.E. degree in 1981, M.E. degree in 1991, and Ph.D. degree in 1999 from Tianjin University, Tianjin, China. Currently, he is a Professor at the School of Electronic Information Engineering, Tianjin University, Tianjin, China. His current research interests include radio frequency identification theory and application, digital signal processing theory and application, pattern recognition.

Changqing Zhang received the B.S. and M.E. degrees in computer science from Sichuan University in 2005 and 2008, and the Ph.D. degree from Tianjin University in 2016, respectively. He is currently an Assistant Professor with Tianjin University. His current research interests include machine learning, data mining, and computer vision.

Jing Wang is currently pursuing the Ph.D. degree from the Faculty of Science and Technology, Bournemouth University, UK. Before that, she received the M.E. degree from City University of Hong Kong, China. Her current research interests include machine learning and data mining, such as nonnegative matrix factorization, subspace clustering and semi-supervised learning.

Xiao Wang received M.E. degree from Henan University, Kaifeng, China, in 2012 and the Ph.D. degree from the School of Computer Science and Technology, Tianjin University, Tianjin, China, in 2016. He is currently a postdoctoral in Department of Computer Science and Technology, Tsinghua University, Beijing, China. He got the China Scholarship Council Fellowship in 2014 and visited Washington University in St. Louis, USA, as a joint training student from Nov. 2014 to Nov. 2015. His current research interests include complex network analysis, machine learning, and data mining.

View full text

Unsupervised feature selection via Diversity-induced Self-representation

Abstract

Introduction

Section snippets

Related work

Diversity-induced Self-representation for unsupervised feature selection

Optimization

Experiments

Conclusion

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Pattern Recognit.

Theor. Comput. Sci.

An introduction to variable and feature selection

J. Mach. Learn. Res.

On similarity preserving feature selection

IEEE Trans. Knowl. Data Eng.

Unsupervised feature selection using feature similarity

IEEE Trans. Pattern Anal. Mach. Intell.

Sparse subspace clusteringalgorithm, theory, and applications

IEEE Trans. Pattern Anal. Mach. Intell.

Selection of variables to preserve multivariate data structure, using principal components

Appl. Stat.

Bayesian feature and model selection for gaussian mixture models

IEEE Trans. Pattern Anal. Mach. Intell.