Elsevier

Neurocomputing

Volume 173, Part 1, 15 January 2016, Pages 102-109
Neurocomputing

Semi-supervised feature selection based on local discriminative information

https://doi.org/10.1016/j.neucom.2015.05.119Get rights and content

Abstract

Feature selection has been an effective way to reduce the dimensionality of the high dimensional data. In this paper, we propose a novel feature selection method which achieves batch feature selection using both supervised and unsupervised data samples. The objective function includes three parts: first, under the assumption that each data sample has been assigned a class label, the ratio of between class scatter matrix and total scatter matrix should be minimized, where the scatter matrices are formed by the selected features of these data samples; second, we use linear regression to model the correlations between the data samples with supervision information and their class labels; last, we use l2,1-norm to guarantee the sparsity of the feature selection matrix and exploit the sharing information between supervised and unsupervised data samples jointly. Different from existing methods, our approach exploits local discriminative information to construct the model, therefore we obtain better results from extensive experiments compared with the existing methods.

Introduction

Recent years have witnessed a surge of multimedia data with very high dimensionality, such as gene expression data, video surveillance data, and meteorological data. This puts an obstacle to the pattern recognition tasks based on these data. Under this circumstance, dimensionality reduction has become a research hotpot in the field of pattern recognition. Feature selection is an effective way to achieve dimensionality reduction.

Current feature selection methods can be roughly categorized into two classes, i.e., supervised methods, unsupervised methods. Fisher criterion [1], [2], [3] is one of the most important supervised methods. The fisher method computes the inter-class distance and inner-class distance with respect to each feature, and chooses the feature with maximum inter-class distance and minimum inner-class distance. However, it ignores the inter-dependency between features when multiple features need to be selected simultaneously. Therefore the selected feature subset cannot necessarily guarantee optimized objective value. Recently, some researchers aim to enhance fisher criterion by incorporating feature–feature dependency in feature selection [4], [5]. In unsupervised scenario, class label information is unavailable directly, which makes the task of feature selection more challenging. A commonly used criterion in unsupervised feature learning is to select features best preserving data similarity or manifold structure constructed from the whole feature space [6], [7], [8], but they fail to incorporate discriminative information implied within data, though it has been shown to be important in data analysis [9], [10].

The supervised methods are developed based on a number of data samples with known class label information, therefore provide higher accuracy rate when used for pattern recognition [11]. But labeling a large number of data samples needs extensive expertise and sometimes is not practical. On the contrary, unsupervised methods do not need any labeling information, but they have to assume the data set is in accordance with some specific distribution, e.g. manifold structure, and try to preserve this distribution on the whole data set. So the effect of unsupervised method is usually not very good. To this end, some researchers propose to use both supervised and unsupervised methods for feature selection, and this is the so-called semi-supervised method. Semi-supervised methods take advantage of the data samples with known class labels to provide prior knowledge for most data samples without class labels. Therefore it is a good compromise between supervised and unsupervised methods [12], [13], [14].

The key for designing an effective semi-supervised feature selection algorithm is to develop a framework, under which the relevance of a feature can be evaluated by both labeled and unlabeled data in a natural way [15]. Feature ranking is a well studied semi-supervised feature selection method which ranks all features with respect to their relevance and chooses the top ranked features as the working feature vector [16]. Laplacian score improves the efficiency of feature selection by considering the underlying manifold structure [6]. However, without label information the Laplacian score can hardly select discriminative features. Based on Laplacian score, a semi-supervised feature selection method was proposed in [17] ,but this method also selects the most discriminating features individually and neglects the correlations among features. In order to identify relevant features, several spectral regression based methods have been proposed [18], [19], [20], i.e., xu et al. [18] and zhao et al.[19] proposed two semi-supervised algorithm based on spectral analysis, these proposed methods try to discover both geometrical and discriminant structure in the data. In [20], an embedded model has been proposed to minimize the feature Redundancy for Spectral Feature selection (MRSF). Although the spectral based methods can achieve good performance in many applications, nevertheless, they must construct a cluster indicator first in the original data with high dimension, which is sensible to the noise. In [21], a semi-supervised method based on graph Laplacian has been proposed for leveraging manifold regularization. Yang et al. [22] proposed a one-step feature selection method, which aims to select the most discriminative features for data representation, where manifold structure are also considered and feature correlations are evaluated by the l2,1-norm regularization, which was first proposed in [23]. Similarly to [22], the method proposed in [24] has gotten more accuracy by imposing a non-negative constraint. However, without the label information, these methods will deteriorate the feature selection performance.

Recent works have shown that there exists some common components in multiple training resources and it is beneficial to leverage such shared information for multimedia analysis applications, such as event detection [25], [26], feature selection [27], [28]. In [25], [26], two complicated event detecting algorithms have been proposed, in which several kinds of training resources are used and the sharing information between these training resources are also explored to improve the detection performance. In order to achieve a good feature selection performance, Yang et al. [27] and Wang et al. [28] proposed to utilize the sharing information among related tasks in their multi-task feature selection framework. However, all these feature selection methods mentioned above are designed in a supervised way, these methods will fail, if the size of training samples is too small. Intuitively, there should exist some sharing information between labeled and unlabeled data collected from the same resource. It will be advantageous to improve the performance of feature selection if such sharing information is exploited.

The above observations motivate us to design a feature selection method, which could select most discriminative features by utilizing the feature correlations and the sharing information between labeled and unlabeled data. Inspired by [22] and [27], in this paper, we propose a novel semi-supervised feature selection method through a multi-task way, i.e., the proposed method contains two tasks, the supervised part task and the unsupervised part task. For the unsupervised part task, like [22], we consider the most discriminative information by exploiting the underlying manifold structure and feature correlations. For the supervised part task we use the label information to guide the training process. To this end, the supervised task and unsupervised task are combined in a framework where the sharing information between them is exploited. In addition, we adopt the l2,1-norm to ensure the sparsity of the feature selection matrix and reduce the impact of the outliers. It is worthwhile to highlight several aspects of the proposed approach:

1. We define a discriminative model for both supervised and unsupervised data, with an assumption that each unsupervised and supervised data contain some sharing information.

2. Local discriminative information, instead of global discriminative information, is used when building the model through fisher criterion and linear regression. This is because it has been observed that local information is more important than global one in pattern recognition.

3. The l2,1-norm is exploited to regularize the feature selection matrix. The regularization ensures the sparsity of the feature selection matrix, more importantly, the sparsity is achieved by learning in unsupervised and supervised samples jointly.

4. The proposed object function is non-smooth and hard to resolve, we propose an efficient iterative approach to optimize the model.

The rest part of this paper is organized as follows. In Section 2, we introduce the related works. The proposed method is described in detail in Section 3. Section 4 demonstrates the experimental results and Section 5 concludes this paper.

Section snippets

l2,1-norm

For an arbitrary matrix DRr×p, we denote its l2,1-norm asD2,1=i=1rj=1pDij2

The l2,1-norm ensures the sparsity of the matrix, which is an important virtue for feature selection [23]. In addition, the sparsity of D is computed with some of its rows shrunk to zero, in this respect, the l2,1-norm will contribute to selecting discriminative features.

Unsupervised discriminative feature selection

Denote Xu={x1u,x2u,,xnu} as the unlabeled training data sample set where xiuRd(1in) is the i-th datum and n is the number of data samples.

Semi-supervised feature selection via local discriminative method

In this section, we give the objective function of the proposed method, and then propose an algorithm to optimize the objective function.

Experiments

In this section, we execute the proposed algorithm on several datasets, including five image datasets (four facial expression databases (YaleB[30], Yale [31] and ORL[32], XM2VTS[33]), one handwritten digits database (USPS[34]), and four UCI databases[35] (wine, vehicle, breast and vote). The description of the selected datasets is shown in Table 1, Table 2, where the properties of each dataset, size, dimensions and classes, are listed as the first three columns. The last column is the number of

Conclusion

This paper proposes a semi-supervised feature selection method based on local discriminative information. The novelty lies in that we use local discriminative information, instead of global discriminative information, to build the models through fisher criterion and linear regression. Besides, the l2,1 norm is exploited to regularize the feature selection matrix by learning in supervised and unsupervised samples jointly. The regularization ensures the sparsity of the feature selection matrix.

Zhiqiang Zeng received the M.Sc. and Ph.D. degrees in Computer Science from the Xi׳an Jiaotong University and Zhejiang University, China, in 2004 and 2007, respectively. He holds a Bachelor degree in Automation from Sichuan University, China, in 1994. In 2008, he joined the Computer Science Department of the Xiamen University of Technology as a research associate. His interests include pattern recognition and machine learning, in particular, support vector machines and general kernel methods.

References (35)

  • D. Ca, C. Zhang, X. He, Unsupervised feature selection for multi-cluster data, In: Proceedings of the 16th ACM SIGKDD...
  • K. Fukunaga

    Introduction to Statistical Pattern recognition

    (1990)
  • X. Du, Y. Yan, P. Pan, et al. Multiple graph unsupervised feature selection. Signal Process., 2014,...
  • Y. Han et al.

    Co-regularized ensemble for feature selection

    Int. Joint Conf. Artif. Intell.

    (2013)
  • X. Chang, F. Nie, Y. Yang, et al. A convex formulation for semi-supervised multi-label feature selection, in:...
  • J.J. Wang, J. Yao, Y.J. Sun, Semi-supervised local-learning-based feature selection, Int. Joint Conf. Neural Netw.,...
  • Y. Han et al.

    Semi-supervised feature selection via spline regression for video semantic recognition

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • Cited by (39)

    • Graph Convolutional Neural Networks with Geometric and Discrimination information

      2021, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Ignoring the discrimination information of the data will lead to the failure to achieve better feature selection effect. Previous studies have shown that discriminant information can be used to improve the performance of feature selection algorithm (Zeng et al., 2016; Yi et al., 2011). Li et al. proposed a discriminant orthogonal nonnegative matrix factorization algorithm (Li et al., 2014), which preserves the local manifold structure and global discriminant information.

    View all citing articles on Scopus

    Zhiqiang Zeng received the M.Sc. and Ph.D. degrees in Computer Science from the Xi׳an Jiaotong University and Zhejiang University, China, in 2004 and 2007, respectively. He holds a Bachelor degree in Automation from Sichuan University, China, in 1994. In 2008, he joined the Computer Science Department of the Xiamen University of Technology as a research associate. His interests include pattern recognition and machine learning, in particular, support vector machines and general kernel methods.

    Xiaodong WANG received his B.E and M.E. degree in College of Computer Science and Electronic Engineering from Hunan University, China, in 2007 and 2010. He is currently a lecturer in the college of Computer and Information Engineering, Xiamen University of Technology. His research interests cover pattern recognition, image processing and embedded system structure.

    Jian Zhang received his B.E. degree and M.E. degree from Shandong University of Science and Technology in 2000 and 2003 respectively, and received the Ph.D. degree from Zhejiang University in 2008. He is currently working in the school of science & technology of Zhejiang International Studies University, as an associate professor for computer science. His research interests include machine learning, computer vision, computer graphics and multimedia processing.

    Qun Wu is Assistant Professor of Product Innovation Design. School of Art and Design, Zhejiang Sci-Tech University, China. He received a Ph.D. in College of Computer Science and Technology from Zhejiang University in China. He holds a Bachelor degree in Industrial Design from Nanchang University in China, and a Master degree in Mechanical Engineering from Shaanxi University of Science and Technology in China. His research interests include machine learning, human factor, product innovation design.

    This paper is supported by the National Natural Science Foundation of China (No.61273290, No.61303143), Xiamen Science and Technology Planning Project (No. 3502Z20143030), Scientific Research Fund of Fujian Provincial Education Department (No. JA15385), and Scientific Research Fund of Zhejiang Provincial Education Department (No.Y201326609).

    View full text