Sparse discriminative feature weights learning
Introduction
In the past decades, dimensionality reduction has attracted much attention for its immense potentials in many applications, e.g., computer vision [1], object recognition [2] and data mining. It can decrease storage requirements, significantly speed up subsequent learning process, and enhance classifiers׳ generalization capability. Generally speaking, dimensionality reduction techniques can be classified into two categories: feature selection and feature extraction [3]. Compared with feature extraction, feature selection merely selects a subset of features from the original features, thus it preserves the semantics of the original features.
According to the availability of label information, feature selection algorithms can be classified into two groups: unsupervised feature selection [4], [5], [6] and supervised feature selection [7], [8], [9], [10]. Unsupervised feature selection algorithms, e.g., Laplacian score [9], discriminative feature selection [11], select the features which best preserve the local data information. Supervised feature selection algorithms, e.g., ReliefF [8], robust feature selection [12], and trace ratio [13], usually select discriminative features according to labels of the training data. In this paper, we focus on supervised feature selection.
For feature selection, researchers mainly focus on searching strategies and measurement criteria.
From the perspective of searching strategies, feature selection can be viewed as a binary optimization problem, which is NP hard. To alleviate this difficulty, heuristic feature selection evaluates the importance of each feature individually, and adds features into the feature subset one by one until the user-defined size is reached. However this greedy way neglects the interaction and dependency among different features. Recently sparse model for joint feature selection algorithms [6], [12], [14], [15] has been developed. For example, Ref. [6] conduct spectral regression and l1-norm minimization in two separated steps. To facilitate feature selection, the -norm regularized model [12], [11], [16], [17] imposes the row-sparse constraint on the regularization term, and consequently they select features associated with the non-zero rows of the feature selection matrix. These efforts have demonstrated the effectiveness of joint feature selection.
In study of measurement criteria, researchers regard manifold structure preserving as a useful criterion for measuring the importance of each feature. For example, He et al. [9] estimate the importance of a feature by its power of locality preserving based on k nearest neighbor graph. Kira et al. [8] evaluate feature weights according to their ability to discriminate between neighboring samples. Yang et al. [11] select the most discriminative features, where manifold structure is considered.
In this paper we incorporate sparse representation, discriminative information, joint feature weights learning into a unified framework to select feature subset in batch mode. The proposed feature selection algorithm is thus coined sparse discriminative feature weights (SDFW) learning, which has the following advantages.
- •
Because of the importance of discriminative information [7] in data analysis, we exploit discriminative information based on sparse representation for feature selection. SDFW characterizes the scatter of samples by the difference between within-class reconstruction residual and between-class reconstruction residual, thus it adaptively seeks features that are efficient for discrimination.
- •
SDFW iteratively estimates the feature weights and sparse representation coefficients alternatively by optimizing the objective function. Compared with traditional methods, SDFW seeks the nearest neighbors of each sample in the weighted space, not the original space. Moreover SDFW provides the potential to discover a possible intrinsic relationship between the feature selection and the sparse representation.
- •
SDFW jointly assigns each feature a real-valued number to indicate its importance. Different from traditional heuristic feature search, SDFW selects the most discriminative feature subset from the original features in batch mode by learning a nonnegative feature weight vector.
- •
Since SDFW, as a joint feature weights learning method, exploits the feature correlation, we do not apply any de-correlation transformation for data preprocess in our algorithm. Furthermore, different from whitening transformation, which removes feature correlation in the original data space, our method takes adaptive mechanism to learn feature weights in the space of selected features.
The remaining of this paper is organized as follows. We briefly review sparse representation and it steered optimized projections for classification in Section 2. In Section 3 we present the proposed formulation followed by a new iterative algorithm to optimizing our objective function. The experiments on four data sets are conducted in Section 4. Finally, we draw a conclusion in Section 5.
Section snippets
Background
In recent years, sparse representation shows great potential for both the academic and industrial communities. Wright et al. [18] present a sparse representation-based classification (SRC). The basic idea of sparse representation methods is to represent a given test data sample as a sparse linear combination of all training samples. Although SRC claims that it is insensitive to dimension reduction, experimental results show that feature extraction is still of great importance since a well
Sparse discriminative feature weights learning
In this section, we first present our objective function, and then put forward an iterative SDFW algorithm to solve the optimization problem.
Experiments
In this section, we evaluate the performance of SDFW for feature selection and compare it with those of representative algorithms in terms of classification.
Conclusions
In this paper, we proposed a novel feature weights learning method (SDFW). It iteratively computed the feature weights and sparse representation coefficients for each sample in the space of selected features, respectively. By learning a nonnegative feature weight vector, SDFW jointly selected the most discriminative features across the original feature space in batch mode. To solve the proposed formulation, an effective iterative optimization algorithm was derived. The experimental results on
Acknowledgments
This work is supported by the National Science Foundation of China (Grant no. 61202134), National Science Fund for Distinguished Young Scholars (Grant no. 61125305), China Postdoctoral Science Foundation (Grant no. AD41431), and the Postdoctoral Science Foundation of Jiangsu Province.
Hui Yan received her B.S. degree and Ph.D. degree from the School of Computer Science and Technology, Nanjing University of Science and Technology (NUST), Nanjing, China, in 2005 and 2011, respectively. In 2009, she was a visiting student at the Department of Electrical and Computer Engineering at National University of Singapore, Singapore. She is currently a lecturer at the School of Computer Science and Engineering, NUST. Her research interests include pattern recognition, computer vision
References (31)
- et al.
Constraint scorea new filter method for feature selection with pair wise constraints
Pattern Recognit.
(2008) - et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognit.
(2010) - et al.
Graph optimization for dimensionality reduction with sparsity constraints
Pattern Recognit.
(2012) - et al.
Optimized projections for sparse representation based classification
Neurocomputing
(2013) - et al.
Bayesian approach to feature selection and parameter tuning for support vector machine classifiers
Neural Netw.
(2005) - et al.
A novel neural dynamical approach to convex quadratic program and its efficient applications
Neural Netw.
(2009) - et al.
Feature selection and multi-kernel learning for sparse representation on a manifold
Neural Netw.
(2014) - N. Naikal, A. Yang, S. Shankar, Informative feature selection for object recognition via sparse PCA, in: International...
- et al.
PCA versus LDA
IEEE Trans. Pattern Anal. Mach. Intell.
(2001) - et al.
Simultaneous feature selection and clustering using mixture models
IEEE Trans. Pattern Anal. Mach. Intell.
(2004)
Feature subset selection and ranking for data dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
Pattern Classification
Cited by (0)
Hui Yan received her B.S. degree and Ph.D. degree from the School of Computer Science and Technology, Nanjing University of Science and Technology (NUST), Nanjing, China, in 2005 and 2011, respectively. In 2009, she was a visiting student at the Department of Electrical and Computer Engineering at National University of Singapore, Singapore. She is currently a lecturer at the School of Computer Science and Engineering, NUST. Her research interests include pattern recognition, computer vision and machine learning.
Jian Yang received his B.S. degree in mathematics from Xuzhou Normal University, Xuzhou, China, in 1995, his M.S. degree in applied mathematics from Changsha Railway University, Changsha, China, in 1998, and his Ph.D. degree in pattern recognition and intelligence systems from Nanjing University of Science and Technology (NUST), Nanjing, China, in 2002. He was a Post-Doctoral Researcher at the University of Zaragoza, Spain, in 2003. From 2004 to 2006, he was a Post-Doctoral Fellow at the Biometrics Centre of Hong Kong Polytechnic University, Hong Kong; from 2006 to 2007, at the Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA. He is currently a Professor at the School of Computer Science and Technology, NUST. He has authored more than 80 academic papers in pattern recognition and computer vision. His journal papers have been cited more than 1800 times in the ISI Web of Science, and 3000 times in the Web of Scholar Google. His current research interests include pattern recognition, computer vision and machine learning.