Unsupervised feature analysis with sparse adaptive learning
Introduction
High-dimensional data, which is frequently confronted in real world applications nowadays, has presented a great challenge to existing technologies. It is hard to directly deal with this high-dimensional data for its huge time and space cost. Besides, noisy and relevant features within such high-dimensional data also put an obstacle to the pattern recognition tasks [1]. Dimensionality reduction is a famous technology to solve this problem. Nevertheless, the feature space of low-dimensional data generated by dimensionality reduction algorithms, such as PCA, is different from that of the original data. Thus, it may be difficult for subsequent applications to understand the relationship between the newly translated feature data and the original data. In contrast, feature selection chooses the most representative features from the original feature space, which could preserve the underlying meaning of the original data. Under this circumstance, feature selection has gained increasing interests in recent years [2].
Unsupervised feature selection, which is designed to handle the unlabeled data and to save the human labor cost, has played an important role in machine learning [3]. Data variance is a commonly used unsupervised feature selection, which evaluates the features by the variance along a dimension, and the features with top K variances will be selected. Laplacian score [4] extends data variance by choosing features with the largest variance and considering the local structure in the original data space. However, these methods neglect the inter-dependency between features when multiple features need to be selected. Recent years, a popularly used criterion is to select the representative features by imposing some kinds of sparse constraints on the whole feature set [5], [6], [7]. Nevertheless, in unsupervised scenarios, there is no label information available, making it difficult to select discriminative features. Since local structure information is more important than the global one, graph-based feature selection has received a significant amount of attention [6], [8]. However, most of the existing graph-based algorithms implemented in a two-stage way, which means the similarity matrix on training data must be constructed firstly and is subsequently used to guide the feature selection procedure. The main drawback for such kind of algorithm is that the similarity matrix remains constant for the subsequent feature selection process, and this will easily lead to a suboptimal result if the training data contains lots of noisy samples. Nie et al. [9] and Wang et al. [10] proposed an adaptive structure learning algorithm to perform the dimensionality reduction and similarity matrix construction simultaneously. They also imposed a novel constraint to ensure the similarity matrix separated into more accurate connected components. However, all of these methods neglected the sparsity of the data structure in each connected component, which has been demonstrated its importance in previous works [7], [11].
To overcome the impact of above-mentioned problems, we propose an unsupervised feature selection algorithm, which is based on the following assumptions. (1) The similarity matrix should keep consistency with the original data structure. In the other words, the similarity matrix should not be far away from the results of graph manifold learning in high-dimensional data. (2) The transformed data structure in each connected component should be clear. To be specific, the construction procedure of similarity matrix should be robust to redundant features and outliers.
The rest of this paper is organized as follows. We first give a brief review of the previous works in Section 2. In Section 3 and Section 4, we derive and optimize our unsupervised feature selection model. In Section 5, we present our experimental evaluation on the proposed method, followed by the conclusion in Section 6.
Section snippets
Related works
In this section, we briefly review the related works on unsupervised feature selection. This paper is closely related to feature selection and manifold learning.
Over the past decades, various kinds of feature selection methods have been proposed, among which the feature selection with regularization gains increasing attentions. For example, Cai and Zhang et al. [8] proposed a multi-cluster structure preserved manifold learning framework, and used an l1-norm based constraint to achieve a sparse
Adaptive structure learning
The key issue for the unsupervised feature selection is to find the most representative features without label information. The most popular technique is to select features well preserving the underlying manifold structure from the original data. To achieve this goal, inspired by the development of manifold learning and sparse learning [7], [9], [11], we propose to select the most discriminative features by performing local adaptive feature learning and sparse learning simultaneously.
It has
Optimization
The objective function we proposed in Eq. (5) involves the l2, 1-norm and a strict constraint on which are non-smooth and are difficult to solve. In this section, we propose to optimize the problem as following steps.
Denote as the cluster indicator matrix and σi(LP) as the i-th smallest eigenvalue of LP. Firstly, to tackle the constraint problem, we can easily find that is equal to . According to the Ky Fan’s Theorem [22],
Experiments
To verify the validity of our algorithm, we apply it to 6 publicly available datasets, including three UCI datasets (Cars, Vehicle, and Wine), and three face image datasets (Yale, Orl, and Umist). A brief description of these datasets is listed in Table 1. We also compare our algorithm with several state-of-the-art unsupervised feature selection algorithms. These algorithms are listed as follows:
All-fea: All the features are preserved, it is also the baseline in our experiments.
MAXVAR: Features
Conclusion
In unsupervised learning scenarios, it is still a challenging task to uncover the discriminative information from unlabeled data. In this paper, inspired by previous works, we propose a novel unsupervised feature selection, which not only utilizes the adaptive manifold structure learning, but also improves learning performance by generating a clear and sparse underlying manifold structure. The proposed objective function involves several non-smooth constraints, making it difficult to solve. We
Acknowledgments
This work is supported by Ministry of Science and Technology, Taiwan, R.O.C. (Grant Nos. MOST-104-2221-E-324-019-MY2, MOST-106-2221-E-324-025, MOST-106-2218-E-324-002), National Natural Science Foundation of Fujian Province, China (Grant Nos. 2016J01324, 2016J01327, 2017J01511), The International Science and Technology Cooperation Program of Xiamen University of Technology (Grant No. E201400400), Scientific Research Fund of Fujian Provincial Education Department (Grant Nos. JA15385, JAT160357).
References (26)
- et al.
Feature selection via binary simultaneous perturbation stochastic approximation
Pattern Recognit. Lett.
(2016) - et al.
Feature selection for unsupervised learning through local learning
Pattern Recognit. Lett.
(2015) - et al.
Unsupervised spectral feature selection with l1-norm graph
Neurocomput.
(2016) - et al.
Semi-supervised orthogonal discriminant analysis via label propagation
Pattern Recogn.
(2009) - et al.
Web and personal image annotation by mining label correlation with relaxed visual graph embedding
IEEE Trans. Image Process.
(2012) - et al.
Laplacian score for feature selection
Advances in Neural Information Processing Systems 18
(2006) - et al.
L2,1-norm regularized discriminative feature selection for unsupervised learning
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence
(2011) - et al.
Discriminating joint feature analysis for multimedia data understanding
IEEE Trans. Multimedia
(2012) - et al.
Unsupervised feature selection for multi-cluster data
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2010) - et al.
Clustering and projected clustering with adaptive neighbors
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2014)