Elsevier

Pattern Recognition Letters

Volume 102, 15 January 2018, Pages 89-94
Pattern Recognition Letters

Unsupervised feature analysis with sparse adaptive learning

https://doi.org/10.1016/j.patrec.2017.12.022Get rights and content

Highlights

  • Perform adaptive manifold learning and feature selection jointly.

  • Impose the non-squared l2-norm to guarantee the clarity of the manifold structure.

  • Propose an efficient algorithm to solve the non-smooth objective function.

  • Verify the effectiveness of our method on several publicly available datasets.

Abstract

Unsupervised feature learning has played an important role in machine learning due to its ability to save human labor cost. Since the absence of labels in such scenario, a commonly used approach is to select features according to the similarity matrix derived from the original feature space. However, their similarity matrices suffer from noises and redundant features, with which are frequently confronted in high-dimensional data. In this paper, we propose a novel unsupervised feature selection algorithm. Compared with the previous works, there are mainly two merits of the proposed algorithm: (1) The similarity matrix is adaptively adjusted with a comprehensive strategy to fully utilize the information in the projected data and the original data. (2) To guarantee the clarity of the dramatically learned manifold structure, a non-squared l2-norm based sparsity method is imposed into the objective function. The proposed objective function involves several non-smooth constraints, making it difficult to solve. We also design an efficient iterative algorithm to optimize it. Experimental results demonstrate the effectiveness of our algorithm compared with the state-of-the-art algorithms on several kinds of publicly available datasets.

Introduction

High-dimensional data, which is frequently confronted in real world applications nowadays, has presented a great challenge to existing technologies. It is hard to directly deal with this high-dimensional data for its huge time and space cost. Besides, noisy and relevant features within such high-dimensional data also put an obstacle to the pattern recognition tasks [1]. Dimensionality reduction is a famous technology to solve this problem. Nevertheless, the feature space of low-dimensional data generated by dimensionality reduction algorithms, such as PCA, is different from that of the original data. Thus, it may be difficult for subsequent applications to understand the relationship between the newly translated feature data and the original data. In contrast, feature selection chooses the most representative features from the original feature space, which could preserve the underlying meaning of the original data. Under this circumstance, feature selection has gained increasing interests in recent years [2].

Unsupervised feature selection, which is designed to handle the unlabeled data and to save the human labor cost, has played an important role in machine learning [3]. Data variance is a commonly used unsupervised feature selection, which evaluates the features by the variance along a dimension, and the features with top K variances will be selected. Laplacian score [4] extends data variance by choosing features with the largest variance and considering the local structure in the original data space. However, these methods neglect the inter-dependency between features when multiple features need to be selected. Recent years, a popularly used criterion is to select the representative features by imposing some kinds of sparse constraints on the whole feature set [5], [6], [7]. Nevertheless, in unsupervised scenarios, there is no label information available, making it difficult to select discriminative features. Since local structure information is more important than the global one, graph-based feature selection has received a significant amount of attention [6], [8]. However, most of the existing graph-based algorithms implemented in a two-stage way, which means the similarity matrix on training data must be constructed firstly and is subsequently used to guide the feature selection procedure. The main drawback for such kind of algorithm is that the similarity matrix remains constant for the subsequent feature selection process, and this will easily lead to a suboptimal result if the training data contains lots of noisy samples. Nie et al. [9] and Wang et al. [10] proposed an adaptive structure learning algorithm to perform the dimensionality reduction and similarity matrix construction simultaneously. They also imposed a novel constraint to ensure the similarity matrix separated into more accurate connected components. However, all of these methods neglected the sparsity of the data structure in each connected component, which has been demonstrated its importance in previous works [7], [11].

To overcome the impact of above-mentioned problems, we propose an unsupervised feature selection algorithm, which is based on the following assumptions. (1) The similarity matrix should keep consistency with the original data structure. In the other words, the similarity matrix should not be far away from the results of graph manifold learning in high-dimensional data. (2) The transformed data structure in each connected component should be clear. To be specific, the construction procedure of similarity matrix should be robust to redundant features and outliers.

The rest of this paper is organized as follows. We first give a brief review of the previous works in Section 2. In Section 3 and Section 4, we derive and optimize our unsupervised feature selection model. In Section 5, we present our experimental evaluation on the proposed method, followed by the conclusion in Section 6.

Section snippets

Related works

In this section, we briefly review the related works on unsupervised feature selection. This paper is closely related to feature selection and manifold learning.

Over the past decades, various kinds of feature selection methods have been proposed, among which the feature selection with regularization gains increasing attentions. For example, Cai and Zhang et al. [8] proposed a multi-cluster structure preserved manifold learning framework, and used an l1-norm based constraint to achieve a sparse

Adaptive structure learning

The key issue for the unsupervised feature selection is to find the most representative features without label information. The most popular technique is to select features well preserving the underlying manifold structure from the original data. To achieve this goal, inspired by the development of manifold learning and sparse learning [7], [9], [11], we propose to select the most discriminative features by performing local adaptive feature learning and sparse learning simultaneously.

It has

Optimization

The objective function we proposed in Eq. (5) involves the l2, 1-norm and a strict constraint on rank(LP)=nc, which are non-smooth and are difficult to solve. In this section, we propose to optimize the problem as following steps.

Denote F=[f1,f2,,fn]Rn×c as the cluster indicator matrix and σi(LP) as the i-th smallest eigenvalue of LP. Firstly, to tackle the rank(LP)=nc constraint problem, we can easily find that rank(LP)=nc is equal to i=1cσi(LP)=0. According to the Ky Fan’s Theorem [22],

Experiments

To verify the validity of our algorithm, we apply it to 6 publicly available datasets, including three UCI datasets (Cars, Vehicle, and Wine), and three face image datasets (Yale, Orl, and Umist). A brief description of these datasets is listed in Table 1. We also compare our algorithm with several state-of-the-art unsupervised feature selection algorithms. These algorithms are listed as follows:

All-fea: All the features are preserved, it is also the baseline in our experiments.

MAXVAR: Features

Conclusion

In unsupervised learning scenarios, it is still a challenging task to uncover the discriminative information from unlabeled data. In this paper, inspired by previous works, we propose a novel unsupervised feature selection, which not only utilizes the adaptive manifold structure learning, but also improves learning performance by generating a clear and sparse underlying manifold structure. The proposed objective function involves several non-smooth constraints, making it difficult to solve. We

Acknowledgments

This work is supported by Ministry of Science and Technology, Taiwan, R.O.C. (Grant Nos. MOST-104-2221-E-324-019-MY2, MOST-106-2221-E-324-025, MOST-106-2218-E-324-002), National Natural Science Foundation of Fujian Province, China (Grant Nos. 2016J01324, 2016J01327, 2017J01511), The International Science and Technology Cooperation Program of Xiamen University of Technology (Grant No. E201400400), Scientific Research Fund of Fujian Provincial Education Department (Grant Nos. JA15385, JAT160357).

References (26)

  • X. Wang et al.

    Discriminative unsupervised dimensionality reduction

    Proceedings of the 24th International Conference on Artificial Intelligence

    (2015)
  • F. Nie et al.

    Unsupervised and semi-supervised learning via l1-norm graph

    2011 International Conference on Computer Vision

    (2011)
  • Z. Zhao et al.

    Efficient spectral feature selection with minimum redundancy

    Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

    (2010)
  • Cited by (0)

    View full text