Unsupervised feature analysis with sparse adaptive learning

doi:10.1016/j.patrec.2017.12.022

Pattern Recognition Letters

Volume 102, 15 January 2018, Pages 89-94

https://doi.org/10.1016/j.patrec.2017.12.022 Get rights and content

Highlights

•
Perform adaptive manifold learning and feature selection jointly.
•
Impose the non-squared l2-norm to guarantee the clarity of the manifold structure.
•
Propose an efficient algorithm to solve the non-smooth objective function.
•
Verify the effectiveness of our method on several publicly available datasets.

Abstract

Unsupervised feature learning has played an important role in machine learning due to its ability to save human labor cost. Since the absence of labels in such scenario, a commonly used approach is to select features according to the similarity matrix derived from the original feature space. However, their similarity matrices suffer from noises and redundant features, with which are frequently confronted in high-dimensional data. In this paper, we propose a novel unsupervised feature selection algorithm. Compared with the previous works, there are mainly two merits of the proposed algorithm: (1) The similarity matrix is adaptively adjusted with a comprehensive strategy to fully utilize the information in the projected data and the original data. (2) To guarantee the clarity of the dramatically learned manifold structure, a non-squared l₂-norm based sparsity method is imposed into the objective function. The proposed objective function involves several non-smooth constraints, making it difficult to solve. We also design an efficient iterative algorithm to optimize it. Experimental results demonstrate the effectiveness of our algorithm compared with the state-of-the-art algorithms on several kinds of publicly available datasets.

Introduction

High-dimensional data, which is frequently confronted in real world applications nowadays, has presented a great challenge to existing technologies. It is hard to directly deal with this high-dimensional data for its huge time and space cost. Besides, noisy and relevant features within such high-dimensional data also put an obstacle to the pattern recognition tasks [1]. Dimensionality reduction is a famous technology to solve this problem. Nevertheless, the feature space of low-dimensional data generated by dimensionality reduction algorithms, such as PCA, is different from that of the original data. Thus, it may be difficult for subsequent applications to understand the relationship between the newly translated feature data and the original data. In contrast, feature selection chooses the most representative features from the original feature space, which could preserve the underlying meaning of the original data. Under this circumstance, feature selection has gained increasing interests in recent years [2].

Unsupervised feature selection, which is designed to handle the unlabeled data and to save the human labor cost, has played an important role in machine learning [3]. Data variance is a commonly used unsupervised feature selection, which evaluates the features by the variance along a dimension, and the features with top K variances will be selected. Laplacian score [4] extends data variance by choosing features with the largest variance and considering the local structure in the original data space. However, these methods neglect the inter-dependency between features when multiple features need to be selected. Recent years, a popularly used criterion is to select the representative features by imposing some kinds of sparse constraints on the whole feature set [5], [6], [7]. Nevertheless, in unsupervised scenarios, there is no label information available, making it difficult to select discriminative features. Since local structure information is more important than the global one, graph-based feature selection has received a significant amount of attention [6], [8]. However, most of the existing graph-based algorithms implemented in a two-stage way, which means the similarity matrix on training data must be constructed firstly and is subsequently used to guide the feature selection procedure. The main drawback for such kind of algorithm is that the similarity matrix remains constant for the subsequent feature selection process, and this will easily lead to a suboptimal result if the training data contains lots of noisy samples. Nie et al. [9] and Wang et al. [10] proposed an adaptive structure learning algorithm to perform the dimensionality reduction and similarity matrix construction simultaneously. They also imposed a novel constraint to ensure the similarity matrix separated into more accurate connected components. However, all of these methods neglected the sparsity of the data structure in each connected component, which has been demonstrated its importance in previous works [7], [11].

To overcome the impact of above-mentioned problems, we propose an unsupervised feature selection algorithm, which is based on the following assumptions. (1) The similarity matrix should keep consistency with the original data structure. In the other words, the similarity matrix should not be far away from the results of graph manifold learning in high-dimensional data. (2) The transformed data structure in each connected component should be clear. To be specific, the construction procedure of similarity matrix should be robust to redundant features and outliers.

The rest of this paper is organized as follows. We first give a brief review of the previous works in Section 2. In Section 3 and Section 4, we derive and optimize our unsupervised feature selection model. In Section 5, we present our experimental evaluation on the proposed method, followed by the conclusion in Section 6.

Section snippets

Related works

In this section, we briefly review the related works on unsupervised feature selection. This paper is closely related to feature selection and manifold learning.

Over the past decades, various kinds of feature selection methods have been proposed, among which the feature selection with regularization gains increasing attentions. For example, Cai and Zhang et al. [8] proposed a multi-cluster structure preserved manifold learning framework, and used an l₁-norm based constraint to achieve a sparse

Adaptive structure learning

The key issue for the unsupervised feature selection is to find the most representative features without label information. The most popular technique is to select features well preserving the underlying manifold structure from the original data. To achieve this goal, inspired by the development of manifold learning and sparse learning [7], [9], [11], we propose to select the most discriminative features by performing local adaptive feature learning and sparse learning simultaneously.

It has

Optimization

The objective function we proposed in Eq. (5) involves the l_{2, 1}-norm and a strict constraint on $r a n k (L_{P}) = n - c,$ which are non-smooth and are difficult to solve. In this section, we propose to optimize the problem as following steps.

Denote $F = [f_{1}, f_{2}, \dots, f_{n}] \in R^{n \times c}$ as the cluster indicator matrix and σ_i(L_P) as the i-th smallest eigenvalue of L_P. Firstly, to tackle the $r a n k (L_{P}) = n - c$ constraint problem, we can easily find that $r a n k (L_{P}) = n - c$ is equal to $\sum_{i = 1}^{c} σ_{i} (L_{P}) = 0$ . According to the Ky Fan’s Theorem [22],

Experiments

To verify the validity of our algorithm, we apply it to 6 publicly available datasets, including three UCI datasets (Cars, Vehicle, and Wine), and three face image datasets (Yale, Orl, and Umist). A brief description of these datasets is listed in Table 1. We also compare our algorithm with several state-of-the-art unsupervised feature selection algorithms. These algorithms are listed as follows:

All-fea: All the features are preserved, it is also the baseline in our experiments.

MAXVAR: Features

Conclusion

In unsupervised learning scenarios, it is still a challenging task to uncover the discriminative information from unlabeled data. In this paper, inspired by previous works, we propose a novel unsupervised feature selection, which not only utilizes the adaptive manifold structure learning, but also improves learning performance by generating a clear and sparse underlying manifold structure. The proposed objective function involves several non-smooth constraints, making it difficult to solve. We

Acknowledgments

This work is supported by Ministry of Science and Technology, Taiwan, R.O.C. (Grant Nos. MOST-104-2221-E-324-019-MY2, MOST-106-2221-E-324-025, MOST-106-2218-E-324-002), National Natural Science Foundation of Fujian Province, China (Grant Nos. 2016J01324, 2016J01327, 2017J01511), The International Science and Technology Cooperation Program of Xiamen University of Technology (Grant No. E201400400), Scientific Research Fund of Fujian Provincial Education Department (Grant Nos. JA15385, JAT160357).

References (26)

V. Aksakalli et al.
Feature selection via binary simultaneous perturbation stochastic approximation
Pattern Recognit. Lett.
(2016)
J. Yao et al.
Feature selection for unsupervised learning through local learning
Pattern Recognit. Lett.
(2015)
X. Wang et al.
Unsupervised spectral feature selection with l1-norm graph
Neurocomput.
(2016)
F. Nie et al.
Semi-supervised orthogonal discriminant analysis via label propagation
Pattern Recogn.
(2009)
Y. Yang et al.
Web and personal image annotation by mining label correlation with relaxed visual graph embedding
IEEE Trans. Image Process.
(2012)
X. He et al.
Laplacian score for feature selection
Advances in Neural Information Processing Systems 18
(2006)
Y. Yang et al.
L2,1-norm regularized discriminative feature selection for unsupervised learning
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence
(2011)
Z. Ma et al.
Discriminating joint feature analysis for multimedia data understanding
IEEE Trans. Multimedia
(2012)
D. Cai et al.
Unsupervised feature selection for multi-cluster data
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2010)
F. Nie et al.
Clustering and projected clustering with adaptive neighbors
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2014)

X. Wang et al.

Discriminative unsupervised dimensionality reduction

Proceedings of the 24th International Conference on Artificial Intelligence

(2015)

F. Nie et al.

Unsupervised and semi-supervised learning via l1-norm graph

2011 International Conference on Computer Vision

(2011)

Z. Zhao et al.

Efficient spectral feature selection with minimum redundancy

Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

(2010)

Cited by (0)

View full text

Unsupervised feature analysis with sparse adaptive learning

Highlights

Abstract

Introduction

Section snippets

Related works

Adaptive structure learning

Optimization

Experiments

Conclusion

Acknowledgments

Pattern Recognit. Lett.

Pattern Recognit. Lett.

Neurocomput.

Pattern Recogn.

Web and personal image annotation by mining label correlation with relaxed visual graph embedding

IEEE Trans. Image Process.

Laplacian score for feature selection

Advances in Neural Information Processing Systems 18

L2,1-norm regularized discriminative feature selection for unsupervised learning

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

Discriminating joint feature analysis for multimedia data understanding

IEEE Trans. Multimedia

Unsupervised feature selection for multi-cluster data

Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Clustering and projected clustering with adaptive neighbors