Group sparse feature selection on local learning based clustering
Introduction
Many applications nowadays such as computer vision, pattern recognition and data mining are facing data of increasing dimensions. Higher dimensionality contains more information but may also introduce more redundancies and noises. As a consequence, “curse of dimensionality” [1], [2] is a prevalent problem for learning in high dimensional space, i.e. algorithms and procedures that are analytically and computationally effective in low dimensional space become totally impractical in this case. Various dimensionality reduction techniques have been introduced to ease this problem. Most canonical among them are feature selection and feature extraction algorithms. Unlike feature extraction methods, such as PCA [3] and LPP [4], which transform original data into a reduced representation set of features, feature selection methods only choose a relevant feature subset from the original data. Thus feature selection is not only simpler and but also better preserves the actual meaning of each feature to help understand its function. The selected subset with the most informative features will effectively reduce the redundancies and noises in data and make the following learning tasks more accurate.
Feature selection methods can be roughly classified as supervised and unsupervised by whether label information is involved in selection. Typical supervised feature selection methods [5], [6], [7], [8] select features having high correlations with class labels. Unsupervised methods in contrast mainly exploit the distribution of data samples to find the optimal feature subset. As labels are usually expensive to acquire, unsupervised methods are used more broadly than supervised ones. However, without the guidance of label information, feature selection becomes a more challenging issue in an unsupervised setting. Many existing unsupervised methods select features that best preserve certain properties of the datasets, such as maximum variance and minimum redundancy. Influential works such as [9], [10], [11] are typical unsupervised feature selection methods.
In recent years, feature selection methods started to explore geometric structures in high dimensional data space. As recent works have shown that data spaces are often low-dimensional manifolds embedded in high-dimensional ambient spaces [12], [13], [14], various feature section methods explicitly considering manifold structures are proposed, including Laplacian score [15], eigenvalue sensitive feature selection [16], multi-cluster feature selection [17] , local kernel regression score [18] and feature selection for local learning based clustering [11], [19]. These methods either directly optimize for manifold structure or use manifold regularizations in their model to select features that respect intrinsic manifold structure in data space. Among them, feature selection methods based on local learning based clustering (LLC) show great improvements in robustness and accuracy on the datasets with manifold structure [11], [19]. However, the limitation of these manifold-based methods for feature selection is that they neglect the group structures in data space. Recent studies [20], [21], [22] point out that group structures are inherent in many datasets such as documents, images, and DNA data, where documents of the same topic naturally forms a group and in DNA data features can be grouped by their metabolic profiling. As data from the same group tend to share the same sparsity pattern in their low-dimensional representation, considering the shared group structures is expected to improve the performance of the feature selection algorithms [6], [23].
In this paper, a new unsupervised group sparse feature selection method based on LLC, named GSFS-llc, is proposed. The motivation of our work is to exploit both the manifold structure and the group structure in feature selection. Using the cluster labels from LLC, GSFS-llc establish a L2,1-norm regularized regression framework to select important features that respect both group structure and manifold structure in data space. As group structure is inherent in many datasets, GSFS-llc will gain better performance comparing with methods only considering the manifold structure in data space.
It is worthwhile to highlight the following aspects of our work:
- 1.
We establish a regression framework using the cluster labels from LLC and group sparse regularization. To the best of our knowledge, this is the first time that local learning based clustering is combined with group sparse regularization for feature selection.
- 2.
By choosing the features that best fit the LLC cluster labels, GSFS-llc naturally respects the manifold structure in data space.
- 3.
By inducing sparse features from L2,1-norm regularized regression, GSFS-llc explicitly considers the group structure in data space. It is thus expected to gain better performance in feature selection and inherit some nice properties associated with group structures [22], such as better stability in the presence of noise.
The rest of the paper is organized as follows: Section 2 gives a brief review of the related work; Section 3 describes the GSFS-llc algorithm; Section 4 shows the experiments and results of the GSFS-llc algorithm compared with other feature selection methods; Section 5 makes a conclusion of this paper.
Section snippets
Related works
Guyon et al. categorized feature selection algorithms into filter methods, wrapper methods and embedded methods [24]. These three kinds of methods differ in how the learning algorithm is incorporated in evaluating and selecting features. Filter methods [25], [15], [23], [16], [26] capture the inner properties of the data using statistical techniques such as variance, Pearson correlation, and mutual information, and select features before running the learning algorithm. Filter methods are
Group sparse feature selection on local learning based clustering
The main idea of GSFS-llc is to select important features from the dataset via a two-step process: (1) obtaining the cluster labels using LLC [34]; (2) fitting the obtained cluster labels using the L2,1-norm regularized regression model and select features with large regression coefficients. Since our regression model and related analysis are fundamentally based on cluster labels from LLC, we will first introduce here the LLC algorithm, as presented in [34], [11], [19].
Experiments
In this section, the performance of the GSFS-llc algorithm will be evaluated on various datasets including handwritten digits, human faces, voices and object images. The experiment setups are described in the following subsection.
Conclusion and future work
This paper proposes an unsupervised feature selection method GSFS-llc using the regularized regression framework. By fitting to the cluster labels obtained from local learning based clustering (LLC), GSFS-llc naturally selects the features respecting the manifold structure in data space. What is more, using group sparse regularization, GSFS- llc inherits the stability associated with group structure and exhibits stable performance in the presence of noise. In comparisons with the other
Acknowledgments
This work is supported by Zhejiang Provincial Natural Science Foundation of China (Grant no. LZ13F020001), National Science Foundation of China (Grant nos. 61173185, 61173186), National Key Technology R&D Program (Grant no. 2014BAK15B02) and Demonstration of Digital Medical Service and Technology in Destined Region (Grant no. 2012AA022814).
Yue Wu received the B.S. degree in computer science from Zhejiang University, China, in 2010. He is currently a Ph.D. candidate in the College of Computer Science at Zhejiang University. His research interests include machine learning, pattern recognition and image retrieval.
References (43)
- et al.
Divergence-based feature selection for separate classes
Neurocomputing
(2013) - et al.
A graph Laplacian based approach to semi-supervised feature selection for regression problems
Neurocomputing
(2013) - et al.
Feature selection for nonlinear models with extreme learning machines
Neurocomputing
(2013) - et al.
Mutual information-based feature selection for multilabel classification
Neurocomputing
(2013) - et al.
Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification
Neurocomputing
(2013) Adaptive Control Processes: A Guided Tour
(1961)- M. Verleysen, Learning high-dimensional data, in: Nato Science Series Sub Series III Computer And Systems Sciences,...
Principal Component Analysis
(2005)- X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems,...
- et al.
Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
IEEE Trans. Pattern Anal. Mach. Intell.
(2005)
Efficient and robust feature selection via joint -norms minimization
Adv. Neural Inf. Process. Syst.
Feature selection for unsupervised learning
J. Mach. Learn. Res.
A global geometric framework for nonlinear dimensionality reduction
Science
Nonlinear dimensionality reduction by locally linear embedding
Science
Laplacian score for feature selection
Adv. Neural Inf. Process. Syst.
Cited by (13)
Locally robust EEG feature selection for individual-independent emotion recognition
2020, Expert Systems with ApplicationsCitation Excerpt :Then, the leave-one-subject-out paradigm was employed again to derive the classification metrics. The feature selection was separately carried out with baseline condition (all features were adopted), correlation coefficient (CC) method, dependence guided unsupervised feature selection (DGU-FS) (Guo & Zhu, 2018), infinite latent feature selection (INF-FS) (Roffo, Melzi, & Cristani, 2015), local learning-based clustering (LLC) (Wu, Wang, Bu, & Chen, 2016), classical RFE, and the proposed LRFS. It is noted that the CC, INF-FS, RFE, and LRFS are supervised by target emotional classes while DGU-FS and LLC are unsupervised approaches.
Multi-task feature selection with sparse regularization to extract common and task-specific features
2019, NeurocomputingCitation Excerpt :Due to the particularity, it can be utilized as a powerful tool in feature selection. Sparse-learning-based feature selection [42–46] makes use of the sparse regularizers which force some feature coefficient closing to zero. Due to its interpretability and excellent performance, sparse-learning-based feature selection has attracted tons of attention from researchers.
Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques
2017, NeurocomputingCitation Excerpt :Consequently, FSS becomes indispensable to extract meaningful information from these huge databases. FSS algorithms are widely applied in various real-world problems such as text categorization [12], computer vision [13], recommendation systems [14], gene analysis on microarray data [15], big data mining [16], and customer relationship management [17]. The reason for a multiobjective formulation is that there are two objectives for the FSS problem.
Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery
2017, Expert Systems with ApplicationsCitation Excerpt :Other techniques used by the filter approach are Gini Index (Shang et al., 2007), Information Gain (Raileanu & Stoffel, 2004), Laplacian Score and Sparsity Scores (He, Cai, & Niyogi, 2005; Liu & Zhang, 2016), among others. Unsupervised strategies are used to support the filter approach as recently proposed by Liu, Liu, Zhang, Wang, and Wang (2016); Wang, Zhang, Liu, Liu, and Wang (2016); Wang, Pedrycz, Zhu, and Zhu (2015); Zhou, Cheng, Pedrycz, Zhang, and Liu (2016), in order to find the best basis features, or to select the features through structured sparsity regularization models in order to preserve the cluster structure of the instances composing the dataset (Chen, 2015; Maldonado, Carrizosa, & Weber, 2015; Shi, Li, Han, & Hu, 2015; Wu, Wang, Bu, & Chen, 2016). Wrapper approaches can use supervised learning to search relationships between the features.
Unsupervised feature selection via Diversity-induced Self-representation
2017, NeurocomputingCitation Excerpt :The aim of feature selection is to obtain a subset of features by removing the noise and redundancy in original features, so that the more intrinsic representation of data and the better performance are achieved [1]. According to the availability of the label information, feature selection approaches can be categorized into supervised methods [2–5] and unsupervised methods [6–11]. Compared with supervised methods, unsupervised feature selection methods aim at selecting relevant features without label information.
Yue Wu received the B.S. degree in computer science from Zhejiang University, China, in 2010. He is currently a Ph.D. candidate in the College of Computer Science at Zhejiang University. His research interests include machine learning, pattern recognition and image retrieval.
Can Wang is currently an associate professor in the College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, machine learning and information accessibility.
Jiajun Bu is a Professor in the College of Computer Science, Zhejiang University. His research interests include information retrieval, computer graphics, embedded system and computer supported cooperative work.
Chun Chen is a professor in the College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.