Joint adaptive manifold and embedding learning for unsupervised feature selection
Introduction
Among thousands of features to form the data representation, always only some of them are informative and discriminative due to the existence of correlations between features and noise [1], [2]. Those redundant and noisy features may reduce the performance of learning algorithms for different purposes [2], [3]. To overcome these problems, feature selection is indeed necessary to select discriminant features and eliminate the correlated, redundant and noisy ones for a compact and better data representation [4].
In this paper, we consider proposing a new unsupervised feature selection method. Due to the lack of label information, preserving the manifold structure built on the whole feature space, which is often characterized by a graph, is always an important criterion for feature reduction and feature selection [5], [6], [7], [8]. Recently, numerous graph-based methods have been proposed for feature selection. Among most of them, there are mainly two steps involved: 1) unfolding the data in a lower-dimensional space or learning cluster labels for data by preserving the similarities between data, and 2) regressing each data point to its lower-dimensional embedding or its cluster label indicator. LapScore [9] and MCFS [10] are two classical feature selection algorithms with the manifold structure information. LapScore [9] evaluates the importance of each feature independently and selects features one by one without considering the correlations between features. Thus, it may select redundant features. On the opposite, MCFS [10] jointly measures the importance of features along each intrinsic dimension and has been proven to acquire better performance. However, they are two-step approaches. That is, the preservation of the structure and feature selection are two different procedures. By jointly learning the lower-dimensional embeddings and solving a sparse regression problem in a unified optimization problem, JELSR [11] integrates the merits of embedding learning and sparse regression. GLSPFS [12] integrates both global similarity and local geometric structure for feature selection. RJGSC [13] first extracts the bases for training data using dictionary learning method, then generates new representations for data by mapping them into the basis space, and finally measures the importance of features based on these representations. GAFS [14] employs a single-layer autoencoder to reconstruct the input data and preserves the geometric structure among data through the corresponding hidden layer activations.
Although label information is not available in unsupervised feature selection, it is still able to impose the smoothness on the cluster labels of neighbours to preserve the graph structure among data [15], [16], [17], [18]. UDFS [15] expects to learn a lower rank subspace by formulating a Fisher-criterion like measure. NDFS [16] and CGSSL [18] simultaneously learn the cluster indicators and select features in a joint framework. They discard the orthogonal constraint on the transformation matrix in UDFS, which is unreasonable because the weight vectors are not necessarily orthogonal to each other in nature [19]. Instead of using -norm based regression, which is not robust to the noise and outliers, RUFS [20] utilizes the -norm minimization on both of cluster label learning and feature learning to gain robust feature selection. NLESLFS [21] imposes the norm on the feature learning in NDFS and preserves the local structure to maintain the original structure of data. To alleviate the effects of the irrelevant features, LGDFS [17] integrates a global and a set of locally linear regression models with -norm regularization into a unified framework. RSFS [22] utilizes a robust local learning method and a robust spectral regression method to explicitly handle the noise on cluster labels. Meanwhile, UPFS [23] finds a subset of discriminant features shared by all the instances and a subset of discriminant features customized for each instance. LDSSL [6] extends UDFS to handle nonlinear datasets by mapping the data into a kernel space.
The aforementioned graph based methods can enhance the performance of machine learning tasks, such as clustering and classification. However, they mainly focus on modelling how to preserve the predefined structure among data, and do not discuss how to effectively learn the structure from data. In contrast, RSR [24] expects to select the representative features by directly exploiting self-representation ability of features, characterizing the linear correlations between features. The approaches [19], [25], [26], [27] try to characterize the manifold by an adaptive reconstructed graph in the feature selection framework. They need to build a neighbourhood graph using -nn method in advance and then learn the similarity matrix to characterize the structure upon this graph. FSASL [28] and SOGFS [29] can learn the structure based on the whole dataset, rather than a predetermined local graph. But they only try to keep the linear correlations between data, failing to explore the nonlinear structure among data [30]. Moreover, they have to recalculate the neighbour candidates for each data in the projected space using -nn method at each iteration. Therefore, the optimization procedures for the objectives are not monotonically decreasing. Besides, SOGFS [29] introduces the unreasonable and unnecessary orthogonal constraint on the transformation matrix [19]. SCUFS [31] aims to explore the nonlinear structure among data by iteratively learning the similarity matrix and clustering labels. But, it can not automatically learn the neighbours for each data. Besides, it learns the similarity matrix in the original data space. When there exist redundant or noisy features in data, the similarity matrix may be inaccurate to characterize the intrinsic structure among data.
To solve unsupervised feature selection problem compounded with the learning of the real geometry structure among data, we propose a novel unsupervised feature selection framework, named as Joint Adaptive Manifold and Embedding Learning (JAMEL) for unsupervised feature selection. As a “good” manifold structure is vital for selecting the most discriminative features, distinct to the above models, most of which need a predetermined graph, we expect to model the approach as it can simultaneously and alternatively learn the manifold structure among data and select features to preserve this structure. This manifold structure can accurately uncover the real neighbours for each data, as well as the correlations between each data and its neighbours. To acquire such a manifold structure, it requires that the model is able to adaptively estimate the neighbourhood size and find the neighbours for each “clean” data, in which the redundant and noisy features are eliminated. It cleans the data according to the importance of features, and then employs a sparse learning technique to automatically learn different number of neighbours for each data according to the distribution density around the cleaned data, aiming to characterize the nonlinear manifold structure among data. Secondly, it learns lower-dimensional embeddings for data. These embeddings can capture the learnt nonlinear manifold structure among data. Thirdly, it projects the data into the embedding space to evaluate the correlations between embeddings and features, with the aim to measure the importance of features. The larger the correlation is, the more important the feature is. Finally, the above processes are repeated so as to gradually remove the redundant and noisy features and eventually pick out the most discriminative ones. Table 1 shows the comparison between the proposed JAMEL and some classical approaches. After that, an effective and efficient optimization algorithm is exploited to solve the proposed challenging problem with the theoretical analysis on its convergence.
In summary, the merits of the proposed JAMEL are:
- (1)
JAMEL can adaptively learn the manifold structure among data according to the distribution density around the cleaned data.
- (2)
JAMEL embeds the data into a lower-dimensional space to capture the learnt nonlinear manifold among data, with the aim to help select the most discriminative features and eliminate the redundant and noisy ones.
- (3)
JAMEL is extremely efficient for high dimensional data, with its computational complexity linear to the data dimensionality.
We justify our approach through extensive experiments and show that our approach obtains better performance. In the rest of the paper, we first have a review of some related work in Section 2. After that, we present our approach in Section 3 and the convergence analysis in Section 4. Experimental evaluation is presented in Section 5, and finally the paper is concluded in Section 6.
Section snippets
Notations
Except specified with explicit declaration, we use upper-case letter, e.g., to represent matrix and lower-case one to represent column vector, e.g., . To a given matrix where is the -entry of denote its -th row and -th column as and respectively. Then, its -norm is defined aswhere (or ) is the number of rows (or columns) in . When it is a square matrix, is its trace. To a given vector we also let denote its
Joint adaptive manifold and embedding learning for unsupervised feature selection
In this section, we first propose our modelling for unsupervised feature selection and then provide an effective algorithm to solve this problem.
Convergence analysis
In the following, we will show Algorithm 1 can minimize Eq. (12). Theorem 1 By alternatively optimizing for and Algorithm 1 monotonically decreases the objective function Eq. (12). Proof Let be the value of Eq. (12) with the variables and set as and at the -th iteration. According to Algorithm 1, we have
Experiments
In this section, we will first evaluate the proposed Joint Adaptive Manifold and Embedding Learning (JAMEL) approach on 2 synthetic datasets to test whether JAMEL can find the representative features. Then, we will evaluate JAMEL on 12 real-world datasets by running -means clustering, spectral clustering, and nearest neighbor classification using the selected features [9], [10], [11] in the following. Then, we also provide the computation cost and the sensitivity analysis of the algorithm
Conclusion
This paper proposes a novel unsupervised feature selection approach, which alternatively learns the nonlinear manifold structure among data and selects the features to capture this structure. Specifically, we embed the manifold structure learning and the manifold structure preserving into a joint framework, to adaptively learn a lower-dimensional space and well explore the nonlinear manifold structure in this space. It can gradually eliminate the redundant and noisy features, and finally
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research was supported by the National Natural Science of Foundation of China (No. 61762061, 62066027, 62076117), the Natural Science Foundation of Jiangxi Province, China (No. 20161ACB20004) and Jiangxi Key Laboratory of Smart City (Grant No. 20192BCD40002).
Jian-Sheng Wu received the B.S. and Ph.D. degrees from the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China, in 2009 and 2015, respectively. He joined the School of Information Engineering, Nanchang University, in 2015. His current research interests include machine learning, feature selection and data mining.
References (37)
- et al.
Joint hypergraph learning and sparse regression for feature selection
Pattern Recognit.
(2017) - et al.
Local discriminative based sparse subspace learning for feature selection
Pattern Recognit.
(2019) - et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognit.
(2010) - et al.
Graph autoencoder-based unsupervised feature selection with broad and local data structure preservation
Neurocomputing
(2018) - et al.
Nonnegative laplacian embedding guided subspace learning for unsupervised feature selection
Pattern Recognit.
(2019) - et al.
Unsupervised feature selection by regularized self-representation
Pattern Recognit.
(2015) - et al.
Self-representation based dual-graph regularized feature clustering
Neurocomputing
(2016) - et al.
Subspace clustering guided unsupervised feature selection
Pattern Recognit.
(2017) - et al.
How many clusters? A robust pso-based local density model
Neurocomputing
(2016) - et al.
Feature selection in clustering problems
NIPS
(2003)
Feature selection for unsupervised learning
J. Mach. Learn Res.
Feature selection using hierarchical feature clustering
ACM CIKM
A new unsupervised spectral feature selection method for mixed data: a filter approach
Pattern Recognit.
Nonlinear dimensionality reduction by locally linear embedding
Science
Laplacian score for feature selection
NIPS
Unsupervised feature selection for multi-cluster data
ACM KDD
Joint embedding learning and sparse regression: a framework for unsupervised feature selection.
IEEE Trans. Cybern.
Global and local structure preservation for feature selection
IEEE Trans. Neural Netw. Learn. Syst.
Cited by (28)
Collaborative and Discriminative Subspace Learning for unsupervised multi-view feature selection
2024, Engineering Applications of Artificial IntelligenceStructure learning with consensus label information for multi-view unsupervised feature selection
2024, Expert Systems with ApplicationsMulti-view unsupervised feature selection with consensus partition and diverse graph
2024, Information SciencesJoint sparse latent representation learning and dual manifold regularization for unsupervised feature selection
2023, Knowledge-Based SystemsRobust unsupervised feature selection via data relationship learning
2023, Pattern RecognitionConsensus cluster structure guided multi-view unsupervised feature selection
2023, Knowledge-Based Systems
Jian-Sheng Wu received the B.S. and Ph.D. degrees from the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China, in 2009 and 2015, respectively. He joined the School of Information Engineering, Nanchang University, in 2015. His current research interests include machine learning, feature selection and data mining.
Meng-Xiao Song is working toward the bachelor’s degree with the School of Information Engineering, Nanchang University, China. Her research interests include the machine learning and data mining.
Weidong Min received the B.E., M.E. and Ph.D. degrees in computer application from Tsinghua University, China in 1989, 1991 and 1995, respectively. He is currently a Professor and the Dean, School of Software, Nanchang University, China. He is an Executive Director of China Society of Image and Graphics. His current research interests include image and video processing, artificial intelligence, big data, distributed system and smart city information technology. Since 2015 he has been a Professor with Nanchang University, China. From 2011 to 2014 he cooperated with School of Computer Science & Software Engineering, Tianjin Polytechnic University, China. From 1998 to 2014 he worked as a Senior Researcher and Senior Project Manager at Corel and other companies in Canada. From 1995 to 1997 he was a Post-Doctoral Researcher at the University of Alberta, Canada. From 1994 to 1995 he was an Assistant Professor at Tsinghua University, China.
Jian-Huang Lai received the M.Sc. degree in applied mathematics and the Ph.D. degree in mathematics from Sun Yat-sen University, China, in 1989 and 1999, respectively. In 1989, he joined Sun Yat-sen University as an Assistant Professor, where he is currently a Professor with the School of Data and Computer Science. He has published over 200 scientific papers in the international journals and at conferences on image processing and pattern recognition, including the IEEE TPAMI, the IEEE TCYB, the IEEE TKDE, the IEEE TNN, the IEEE TIP, Pattern Recognition, ICCV, CVPR, and ICDM. His current research interests include pattern recognition, data mining, and computer vision.
Wei-Shi Zheng is now a professor with Sun Yat-sen University. He has now published more than 90 papers, including more than 60 publications in main journals (the IEEE Transactions on Pattern Analysis and Machine Intelligence, the IEEE Transactions on Image Processing, the Pattern Recognition) and top conferences (ICCV, CVPR, IJCAI). His research interests include person/ object association and activity understanding in visual surveillance. He has joined Microsoft Research Asia Young Faculty Visiting Programme. He is a recipient of Excellent Young Scientists Fund of the NSFC, and a recipient of Royal Society-Newton Advanced Fellowship, United Kingdom. He is an associate editor of the journal of pattern recognition. Homepage: http://isee.sysu.edu.cn/%7ezhwshi/.