Sparse discriminative feature weights learning

doi:10.1016/j.neucom.2015.09.065

Neurocomputing

Volume 173, Part 3, 15 January 2016, Pages 1936-1942

https://doi.org/10.1016/j.neucom.2015.09.065 Get rights and content

Abstract

Sparse representation, a locality-based data representation method, leads to promising results in many scientific and engineering fields. Meanwhile in the study of feature selection, locality preserving is widely recognized as an effective measurement criterion. In this paper, we introduce l₁-norm driven sparse representation into feature selection, and propose a novel joint feature weights learning algorithm, named sparse discriminative feature weights (SDFW). SDFW assigns the highest score to the feature that has the smallest difference between within-class reconstruction residual and between-class reconstruction residual in the space of selected features. It possesses the following advantages: (1) compared with feature selection methods based on k nearest neighbors, SDFW automatically (vs. manually) determines neighborhood for individual sample; (2) compared with conventional heuristic feature search which selects features individually, SDFW selects feature subset in batch mode. Extensive experiments on different data types demonstrate the effectiveness of SDFW.

Introduction

In the past decades, dimensionality reduction has attracted much attention for its immense potentials in many applications, e.g., computer vision [1], object recognition [2] and data mining. It can decrease storage requirements, significantly speed up subsequent learning process, and enhance classifiers׳ generalization capability. Generally speaking, dimensionality reduction techniques can be classified into two categories: feature selection and feature extraction [3]. Compared with feature extraction, feature selection merely selects a subset of features from the original features, thus it preserves the semantics of the original features.

According to the availability of label information, feature selection algorithms can be classified into two groups: unsupervised feature selection [4], [5], [6] and supervised feature selection [7], [8], [9], [10]. Unsupervised feature selection algorithms, e.g., Laplacian score [9], discriminative feature selection [11], select the features which best preserve the local data information. Supervised feature selection algorithms, e.g., ReliefF [8], robust feature selection [12], and trace ratio [13], usually select discriminative features according to labels of the training data. In this paper, we focus on supervised feature selection.

For feature selection, researchers mainly focus on searching strategies and measurement criteria.

From the perspective of searching strategies, feature selection can be viewed as a binary optimization problem, which is NP hard. To alleviate this difficulty, heuristic feature selection evaluates the importance of each feature individually, and adds features into the feature subset one by one until the user-defined size is reached. However this greedy way neglects the interaction and dependency among different features. Recently sparse model for joint feature selection algorithms [6], [12], [14], [15] has been developed. For example, Ref. [6] conduct spectral regression and l₁-norm minimization in two separated steps. To facilitate feature selection, the $l_{2, 1}$ -norm regularized model [12], [11], [16], [17] imposes the row-sparse constraint on the regularization term, and consequently they select features associated with the non-zero rows of the feature selection matrix. These efforts have demonstrated the effectiveness of joint feature selection.

In study of measurement criteria, researchers regard manifold structure preserving as a useful criterion for measuring the importance of each feature. For example, He et al. [9] estimate the importance of a feature by its power of locality preserving based on k nearest neighbor graph. Kira et al. [8] evaluate feature weights according to their ability to discriminate between neighboring samples. Yang et al. [11] select the most discriminative features, where manifold structure is considered.

In this paper we incorporate sparse representation, discriminative information, joint feature weights learning into a unified framework to select feature subset in batch mode. The proposed feature selection algorithm is thus coined sparse discriminative feature weights (SDFW) learning, which has the following advantages.

•
Because of the importance of discriminative information [7] in data analysis, we exploit discriminative information based on sparse representation for feature selection. SDFW characterizes the scatter of samples by the difference between within-class reconstruction residual and between-class reconstruction residual, thus it adaptively seeks features that are efficient for discrimination.
•
SDFW iteratively estimates the feature weights and sparse representation coefficients alternatively by optimizing the objective function. Compared with traditional methods, SDFW seeks the nearest neighbors of each sample in the weighted space, not the original space. Moreover SDFW provides the potential to discover a possible intrinsic relationship between the feature selection and the sparse representation.
•
SDFW jointly assigns each feature a real-valued number to indicate its importance. Different from traditional heuristic feature search, SDFW selects the most discriminative feature subset from the original features in batch mode by learning a nonnegative feature weight vector.
•
Since SDFW, as a joint feature weights learning method, exploits the feature correlation, we do not apply any de-correlation transformation for data preprocess in our algorithm. Furthermore, different from whitening transformation, which removes feature correlation in the original data space, our method takes adaptive mechanism to learn feature weights in the space of selected features.

The remaining of this paper is organized as follows. We briefly review sparse representation and it steered optimized projections for classification in Section 2. In Section 3 we present the proposed formulation followed by a new iterative algorithm to optimizing our objective function. The experiments on four data sets are conducted in Section 4. Finally, we draw a conclusion in Section 5.

Section snippets

Background

In recent years, sparse representation shows great potential for both the academic and industrial communities. Wright et al. [18] present a sparse representation-based classification (SRC). The basic idea of sparse representation methods is to represent a given test data sample as a sparse linear combination of all training samples. Although SRC claims that it is insensitive to dimension reduction, experimental results show that feature extraction is still of great importance since a well

Sparse discriminative feature weights learning

In this section, we first present our objective function, and then put forward an iterative SDFW algorithm to solve the optimization problem.

Experiments

In this section, we evaluate the performance of SDFW for feature selection and compare it with those of representative algorithms in terms of classification.

Conclusions

In this paper, we proposed a novel feature weights learning method (SDFW). It iteratively computed the feature weights and sparse representation coefficients for each sample in the space of selected features, respectively. By learning a nonnegative feature weight vector, SDFW jointly selected the most discriminative features across the original feature space in batch mode. To solve the proposed formulation, an effective iterative optimization algorithm was derived. The experimental results on

Acknowledgments

This work is supported by the National Science Foundation of China (Grant no. 61202134), National Science Fund for Distinguished Young Scholars (Grant no. 61125305), China Postdoctoral Science Foundation (Grant no. AD41431), and the Postdoctoral Science Foundation of Jiangsu Province.

Hui Yan received her B.S. degree and Ph.D. degree from the School of Computer Science and Technology, Nanjing University of Science and Technology (NUST), Nanjing, China, in 2005 and 2011, respectively. In 2009, she was a visiting student at the Department of Electrical and Computer Engineering at National University of Singapore, Singapore. She is currently a lecturer at the School of Computer Science and Engineering, NUST. Her research interests include pattern recognition, computer vision

References (31)

D.Q. Zhang et al.
Constraint scorea new filter method for feature selection with pair wise constraints
Pattern Recognit.
(2008)
L.S. Qiao et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognit.
(2010)
L.N. Zhang et al.
Graph optimization for dimensionality reduction with sparsity constraints
Pattern Recognit.
(2012)
C.Y. Lu et al.
Optimized projections for sparse representation based classification
Neurocomputing
(2013)
C. Gold et al.
Bayesian approach to feature selection and parameter tuning for support vector machine classifiers
Neural Netw.
(2005)
Y. Xia et al.
A novel neural dynamical approach to convex quadratic program and its efficient applications
Neural Netw.
(2009)
J.J. Wang et al.
Feature selection and multi-kernel learning for sparse representation on a manifold
Neural Netw.
(2014)
N. Naikal, A. Yang, S. Shankar, Informative feature selection for object recognition via sparse PCA, in: International...
A.M. Martinez et al.
PCA versus LDA
IEEE Trans. Pattern Anal. Mach. Intell.
(2001)
M. Law et al.
Simultaneous feature selection and clustering using mixture models
IEEE Trans. Pattern Anal. Mach. Intell.
(2004)

H. Wei et al.

Feature subset selection and ranking for data dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

D. Cai, C. Zhang, X. He, Unsupervised feature selection for multicluster data, in: ACM SIGKDD Conference on Knowledge...

R.O. Duda et al.

Pattern Classification

(2001)

K. Kira, L.A. Rendell, A practical approach to feature selection, in: International Conference on Machine Learning,...

X.F. He, D. Cai, P. Niyogi, Laplacian score for feature selection, Neural Information Processing Systems, Vancouver,...

Cited by (0)

Jian Yang received his B.S. degree in mathematics from Xuzhou Normal University, Xuzhou, China, in 1995, his M.S. degree in applied mathematics from Changsha Railway University, Changsha, China, in 1998, and his Ph.D. degree in pattern recognition and intelligence systems from Nanjing University of Science and Technology (NUST), Nanjing, China, in 2002. He was a Post-Doctoral Researcher at the University of Zaragoza, Spain, in 2003. From 2004 to 2006, he was a Post-Doctoral Fellow at the Biometrics Centre of Hong Kong Polytechnic University, Hong Kong; from 2006 to 2007, at the Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA. He is currently a Professor at the School of Computer Science and Technology, NUST. He has authored more than 80 academic papers in pattern recognition and computer vision. His journal papers have been cited more than 1800 times in the ISI Web of Science, and 3000 times in the Web of Scholar Google. His current research interests include pattern recognition, computer vision and machine learning.

View full text

Sparse discriminative feature weights learning

Abstract

Introduction

Section snippets

Background

Sparse discriminative feature weights learning

Experiments

Conclusions

Acknowledgments

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Neural Netw.

Neural Netw.

Neural Netw.

PCA versus LDA

IEEE Trans. Pattern Anal. Mach. Intell.

Simultaneous feature selection and clustering using mixture models

IEEE Trans. Pattern Anal. Mach. Intell.

Feature subset selection and ranking for data dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Pattern Classification