Elsevier

Neurocomputing

Volume 173, Part 3, 15 January 2016, Pages 1936-1942
Neurocomputing

Sparse discriminative feature weights learning

https://doi.org/10.1016/j.neucom.2015.09.065Get rights and content

Abstract

Sparse representation, a locality-based data representation method, leads to promising results in many scientific and engineering fields. Meanwhile in the study of feature selection, locality preserving is widely recognized as an effective measurement criterion. In this paper, we introduce l1-norm driven sparse representation into feature selection, and propose a novel joint feature weights learning algorithm, named sparse discriminative feature weights (SDFW). SDFW assigns the highest score to the feature that has the smallest difference between within-class reconstruction residual and between-class reconstruction residual in the space of selected features. It possesses the following advantages: (1) compared with feature selection methods based on k nearest neighbors, SDFW automatically (vs. manually) determines neighborhood for individual sample; (2) compared with conventional heuristic feature search which selects features individually, SDFW selects feature subset in batch mode. Extensive experiments on different data types demonstrate the effectiveness of SDFW.

Introduction

In the past decades, dimensionality reduction has attracted much attention for its immense potentials in many applications, e.g., computer vision [1], object recognition [2] and data mining. It can decrease storage requirements, significantly speed up subsequent learning process, and enhance classifiers׳ generalization capability. Generally speaking, dimensionality reduction techniques can be classified into two categories: feature selection and feature extraction [3]. Compared with feature extraction, feature selection merely selects a subset of features from the original features, thus it preserves the semantics of the original features.

According to the availability of label information, feature selection algorithms can be classified into two groups: unsupervised feature selection [4], [5], [6] and supervised feature selection [7], [8], [9], [10]. Unsupervised feature selection algorithms, e.g., Laplacian score [9], discriminative feature selection [11], select the features which best preserve the local data information. Supervised feature selection algorithms, e.g., ReliefF [8], robust feature selection [12], and trace ratio [13], usually select discriminative features according to labels of the training data. In this paper, we focus on supervised feature selection.

For feature selection, researchers mainly focus on searching strategies and measurement criteria.

From the perspective of searching strategies, feature selection can be viewed as a binary optimization problem, which is NP hard. To alleviate this difficulty, heuristic feature selection evaluates the importance of each feature individually, and adds features into the feature subset one by one until the user-defined size is reached. However this greedy way neglects the interaction and dependency among different features. Recently sparse model for joint feature selection algorithms [6], [12], [14], [15] has been developed. For example, Ref. [6] conduct spectral regression and l1-norm minimization in two separated steps. To facilitate feature selection, the l2,1-norm regularized model [12], [11], [16], [17] imposes the row-sparse constraint on the regularization term, and consequently they select features associated with the non-zero rows of the feature selection matrix. These efforts have demonstrated the effectiveness of joint feature selection.

In study of measurement criteria, researchers regard manifold structure preserving as a useful criterion for measuring the importance of each feature. For example, He et al. [9] estimate the importance of a feature by its power of locality preserving based on k nearest neighbor graph. Kira et al. [8] evaluate feature weights according to their ability to discriminate between neighboring samples. Yang et al. [11] select the most discriminative features, where manifold structure is considered.

In this paper we incorporate sparse representation, discriminative information, joint feature weights learning into a unified framework to select feature subset in batch mode. The proposed feature selection algorithm is thus coined sparse discriminative feature weights (SDFW) learning, which has the following advantages.

  • Because of the importance of discriminative information [7] in data analysis, we exploit discriminative information based on sparse representation for feature selection. SDFW characterizes the scatter of samples by the difference between within-class reconstruction residual and between-class reconstruction residual, thus it adaptively seeks features that are efficient for discrimination.

  • SDFW iteratively estimates the feature weights and sparse representation coefficients alternatively by optimizing the objective function. Compared with traditional methods, SDFW seeks the nearest neighbors of each sample in the weighted space, not the original space. Moreover SDFW provides the potential to discover a possible intrinsic relationship between the feature selection and the sparse representation.

  • SDFW jointly assigns each feature a real-valued number to indicate its importance. Different from traditional heuristic feature search, SDFW selects the most discriminative feature subset from the original features in batch mode by learning a nonnegative feature weight vector.

  • Since SDFW, as a joint feature weights learning method, exploits the feature correlation, we do not apply any de-correlation transformation for data preprocess in our algorithm. Furthermore, different from whitening transformation, which removes feature correlation in the original data space, our method takes adaptive mechanism to learn feature weights in the space of selected features.

The remaining of this paper is organized as follows. We briefly review sparse representation and it steered optimized projections for classification in Section 2. In Section 3 we present the proposed formulation followed by a new iterative algorithm to optimizing our objective function. The experiments on four data sets are conducted in Section 4. Finally, we draw a conclusion in Section 5.

Section snippets

Background

In recent years, sparse representation shows great potential for both the academic and industrial communities. Wright et al. [18] present a sparse representation-based classification (SRC). The basic idea of sparse representation methods is to represent a given test data sample as a sparse linear combination of all training samples. Although SRC claims that it is insensitive to dimension reduction, experimental results show that feature extraction is still of great importance since a well

Sparse discriminative feature weights learning

In this section, we first present our objective function, and then put forward an iterative SDFW algorithm to solve the optimization problem.

Experiments

In this section, we evaluate the performance of SDFW for feature selection and compare it with those of representative algorithms in terms of classification.

Conclusions

In this paper, we proposed a novel feature weights learning method (SDFW). It iteratively computed the feature weights and sparse representation coefficients for each sample in the space of selected features, respectively. By learning a nonnegative feature weight vector, SDFW jointly selected the most discriminative features across the original feature space in batch mode. To solve the proposed formulation, an effective iterative optimization algorithm was derived. The experimental results on

Acknowledgments

This work is supported by the National Science Foundation of China (Grant no. 61202134), National Science Fund for Distinguished Young Scholars (Grant no. 61125305), China Postdoctoral Science Foundation (Grant no. AD41431), and the Postdoctoral Science Foundation of Jiangsu Province.

Hui Yan received her B.S. degree and Ph.D. degree from the School of Computer Science and Technology, Nanjing University of Science and Technology (NUST), Nanjing, China, in 2005 and 2011, respectively. In 2009, she was a visiting student at the Department of Electrical and Computer Engineering at National University of Singapore, Singapore. She is currently a lecturer at the School of Computer Science and Engineering, NUST. Her research interests include pattern recognition, computer vision

References (31)

  • H. Wei et al.

    Feature subset selection and ranking for data dimensionality reduction

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • D. Cai, C. Zhang, X. He, Unsupervised feature selection for multicluster data, in: ACM SIGKDD Conference on Knowledge...
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • K. Kira, L.A. Rendell, A practical approach to feature selection, in: International Conference on Machine Learning,...
  • X.F. He, D. Cai, P. Niyogi, Laplacian score for feature selection, Neural Information Processing Systems, Vancouver,...
  • Cited by (0)

    Hui Yan received her B.S. degree and Ph.D. degree from the School of Computer Science and Technology, Nanjing University of Science and Technology (NUST), Nanjing, China, in 2005 and 2011, respectively. In 2009, she was a visiting student at the Department of Electrical and Computer Engineering at National University of Singapore, Singapore. She is currently a lecturer at the School of Computer Science and Engineering, NUST. Her research interests include pattern recognition, computer vision and machine learning.

    Jian Yang received his B.S. degree in mathematics from Xuzhou Normal University, Xuzhou, China, in 1995, his M.S. degree in applied mathematics from Changsha Railway University, Changsha, China, in 1998, and his Ph.D. degree in pattern recognition and intelligence systems from Nanjing University of Science and Technology (NUST), Nanjing, China, in 2002. He was a Post-Doctoral Researcher at the University of Zaragoza, Spain, in 2003. From 2004 to 2006, he was a Post-Doctoral Fellow at the Biometrics Centre of Hong Kong Polytechnic University, Hong Kong; from 2006 to 2007, at the Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA. He is currently a Professor at the School of Computer Science and Technology, NUST. He has authored more than 80 academic papers in pattern recognition and computer vision. His journal papers have been cited more than 1800 times in the ISI Web of Science, and 3000 times in the Web of Scholar Google. His current research interests include pattern recognition, computer vision and machine learning.

    View full text