Elsevier

Neurocomputing

Volume 273, 17 January 2018, Pages 593-610
Neurocomputing

Unsupervised feature selection by regularized matrix factorization

https://doi.org/10.1016/j.neucom.2017.08.047Get rights and content

Abstract

Feature selection is an interesting and challenging task in data analysis process. In this paper, a novel algorithm named Regularized Matrix Factorization Feature Selection (RMFFS) is proposed for unsupervised feature selection. Compared with other matrix factorization based feature selection methods, a main advantage of our algorithm is that it takes the correlation among features into consideration. Through introducing an inner product regularization into our algorithm, the features selected by RMFFS would not only well represent the original high-dimensional data, but also contain low redundancy. Moreover, a simple yet efficient iteratively updating algorithm is also developed to solve the proposed RMFFS. Extensive experimental results on nine real world databases demonstrate that our proposed method can achieve better performance than some state-of-the-art unsupervised feature selection methods.

Introduction

Nowadays, the data obtained in many real-world applications such as pattern recognition, computer vision and image processing is often high-dimensional. High-dimensional data not only makes the model learning process to be time-consuming, but also needs a large space for storage. Moreover, the high-dimensional data may contain some irrelevant, redundant or even noisy features, which would weaken the performance of learning model [1]. As a result, dimensionality reduction is often required as a preliminary and important stage in data analysis processing. According to some studies, dimensionality reduction not only helps to avoid the “curse of dimensionality” problem, but also contributes to accomplish the data analysis tasks at a low computational cost [2]. Currently, the most widely used dimensionality reduction techniques are feature extraction [3] and feature selection [4]. The goal of feature extraction is to find a linear or non-linear projection which maps the original high-dimensional data into a lower-dimensional subspace [3], [5]. While the feature extraction methods, such as Principle Component Analysis (PCA) [6] and Non-negative Matrix Factorization (NMF) [7], have been well studied and showed good performances in some applications, the low-dimensional features in them are formed by combining several original features. Therefore, the interpretability of low-dimensional features obtained by them is relatively poor [8]. Different from feature extraction, feature selection methods aim to select an optimal subset from the original high-dimensional features based on some evaluation criterions [9]. Thus, they can preserve the semantics of original features and make the dimensionality reduction results more interpretable for domain experts [8]. Furthermore, another advantage of feature selection over feature extraction is that once the low-dimensional features have been selected, we only need to calculate or collect these concerning features during data acquisition.

In the past decades, various feature selection techniques have been proposed by researchers. According to the availability of class labels, the feature selection methods can be generally categorized into three classes: supervised [10], unsupervised [11], [12], [13] and semi-supervised [14]. Among them, unsupervised feature selection is regarded as the most difficult and challenging task due to the absence of class labels during feature selection process. Thus, many efforts have been devoted to addressing the unsupervised feature selection problem. Variance Score (VS) [15] is a simple unsupervised feature selection algorithm. In this algorithm, the variance of each feature was first calculated. Then, the features with larger variances were selected as optimal features. In [16], an unsupervised feature selection algorithm termed Laplacian Score (LS) was proposed. Through taking the locality preserving ability of features into account, the feature subset which best maintains the manifold structure of original high-dimensional data can be selected by Zhao and Liu proposed a spectral analysis based feature selection (SPEC) approach in [17]. In SPEC, a graph which reflects the similarity between high-dimensional data was first constructed. Then, the feature subset which can preserve the structure information of the graph was selected using graph spectral theory. Although VS, LS and SPEC are easily implemented, they all select features in a one by one manner. Therefore, the correlations among features are ignored in them. In order to overcome this limitation, several sparsity regularization based methods have been put forward. Cai et al. [11] proposed a Multi-Cluster Feature Selection (MCFS) algorithm,which could be considered as a spectral regression model to select features. The main character MCFS is to use ℓ1-norm regularization for ensuring the sparsity of regression coefficients so that the significance of each feature can be effectively measured. Yang et al. [18] presented an Unsupervised Discriminative Feature Selection (UDFS) algorithm, which combined the discriminative information and ℓ2, 1-norm regularization for feature selection. Since the ℓ2, 1-norm regularization can make several rows of the feature selection matrix shrink to zero [19], the features corresponding to non-zero rows of the matrix can be selected to form the optimal feature subset in UDFS. In [20], an unsupervised feature selection algorithm called Regularized Self-Representation (RSR) was also proposed based on ℓ2, 1-norm regularization. RSR assumed that each feature in high-dimensional data can be represented by the linear combination of other features. Thus, a linear regression model was first established in this algorithm. Then, the ℓ2, 1-norm regularization was introduced into the regression model to select the most representative features.

Besides the sparsity regularization, matrix factorization has also been widely used in dimensionality reduction problem. The typical matrix factorization based dimensionality reduction methods include SVD [21], PCA [6] and NMF [7]. However, these methods and their related extensions [22], [23], [24], [25] are all designed for feature extraction rather than feature selection. Thus, as we have analyzed above, the low-dimensional features obtained by them lack interpretability. In order to remedy this limitation, Wang et al. proposed a matrix factorization based feature selection algorithm termed Matrix Factorization based Feature Selection (MFFS) [26]. MFFS was developed from the viewpoint of subspace learning. That is, it treated feature selection as a matrix factorization problem and introduced an orthogonal constraint into its objective function to select the most informative features from high-dimensional data. Although MFFS successfully constructed a bridge between the matrix factorization and feature selection, and outperformed LS and some other algorithms, the orthogonal constraint in it is too strict to be satisfied in practice [27]. Moreover, the correlations among features were also neglected in MFFS. Therefore, the features selected by it may contain some redundancy, which makes the feature subset far from optimal [8].

In this paper, a new yet effective Regularized Matrix Factorization based Feature Selection algorithm called RMFFS is proposed. Compared with other approaches such as RSR [20] and MFFS [26], RMFFS does not only select a feature subset that can approximately represent all features, but also makes the selected features to be low redundant. That is, the selected features by our RMFFS are characterized by representative and linear independent. To this end, we impose a regularization term which can be regarded as a combination of ℓ1-norm and ℓ2-norm on the objective function of matrix factorization. Moreover, an efficient iteratively updating algorithm is also designed to optimize the objective function of RMFFS and the convergence of our algorithm is analyzed in detail. We compare the proposed RMFFS with six state-of-the-art feature selection methods on nine widely used benchmark databases. Experimental results demonstrate that the proposed method is effective and outperforms other methods.

The remainder of this paper is organized as follows. In Section 2, the proposed RMFFS is presented and an effective solution for our algorithm is provided. Then, the experiment results of different methods on nine benchmark databases are compared in Section 3. At last, the conclusions of this paper are given in Section 4.

In order to facilitate the presentation, some notations frequently used in this paper are tabulated in Table 1.

Section snippets

The proposed method

In this section, we first describe the proposed unsupervised feature selection model. Afterwards, an efficient iterative update algorithm is provided to solve our model. At last, the convergence of our algorithm is also proven.

Experiments

In this section, performance of our proposed RMFFS is compared with several state-of-the-art unsupervised feature selection methods for clustering tasks.

Conclusions

In this paper, we proposed a new unsupervised feature selection method named as RMFFS, which selects features based on matrix factorization. For the sake of taking the correlations among features into consideration, we employ the absolute values of the inner product of the feature weight matrix vectors as a regularization term to ensure that the feature weight matrix is the sparsity and low redundancy simultaneously. Furthermore, we also propose an efficient iterative algorithm to optimize the

Acknowledgment

This work is supported by National Natural Science Foundation of China (Nos. 61403078, 61602221, 61672150, 11671268, 41671379 and 61702092), Research Fund of Jilin Province Science and Technology Development Project (No. 20160204047GX), Doctoral Fund of Jiangxi Normal University (No. 7525), Natural Science Foundation of Jiangxi Province (20171BAB212009) and Science and Technology Research Project of Jiangxi Provincial Department of Education (GJJ160333).

Miao Qi was born in Liaoning, China. She received her BS and MS degrees from Computer School of Northeast Normal University, China, in 2004 and 2007. In 2010, she received the Ph.D. degree from Northeast Normal University. Now, she is an associate professor in Northeast Normal University, China. Her main research interests are digital image processing, chemistry computation and pattern recognition.

References (31)

  • FangJ. et al.

    Pattern-coupled sparse bayesian learning for recovery of block-sparse signals

    IEEE Trans. Signal Process.

    (2015)
  • I. Jolliffe

    Principal Component Analysis

    (1986)
  • LeeD.D. et al.

    Algorithms for non-negative matrix factorization

  • I. Guyon et al.

    An introduction to variable and feature selection

    J. Mach. Learn. Res.

    (2003)
  • PengH. et al.

    Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • Cited by (52)

    View all citing articles on Scopus

    Miao Qi was born in Liaoning, China. She received her BS and MS degrees from Computer School of Northeast Normal University, China, in 2004 and 2007. In 2010, she received the Ph.D. degree from Northeast Normal University. Now, she is an associate professor in Northeast Normal University, China. Her main research interests are digital image processing, chemistry computation and pattern recognition.

    Ting Wang was born in Guizhou province, China, in 1990. She received the BE degree from Guizhou University, China, in 2014 and MS degree from School of Computer Science and information Technology, Northeast Normal University, China, in 2017. Now, she is pursuing a Ph.D. in School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. Her current research interests include deep learning and feature selection.

    Fucong Liu was born in Jilin province, China, in 1988. He received the BS degree from Changchun University of Science and Technology, China, in 2011 and MS degree from Mathematics School of Jilin University in 2014. He is now in pursuit for Ph.D. degree from College of Computer Science and information Technology, Northeast Normal University. His research interests include dimensionality reduction, sparse representation and feature extraction.

    Baoxue Zhang was born in Jilin, China, in 1968. He received a Ph.D. degree from Jilin University, China, in 1999. Now, he is a professor and Dean in School of Statistics, Capital University of Economics and Business. His research interests focus on information statistic.

    Jianzhong Wang was born in Changchun, Jilin province, China, in 1981. He received the BS degree from Computer School of Jilin University, China, in 2004 and MS degree from Computer School of Northeast Normal University, China, in 2007. In 2010, he received the Ph.D. degree from School of Mathematics and Statistics, Northeast Normal University. Now, he is a lecturer in College of Computer Science and information Technology, Northeast Normal University. His research interests focus on dimensionality reduction and image processing.

    Yugen Yi was born in Jiangxi province, China, in 1986. He received the BS degree from College of Humanities & Sciences of Northeast Normal University, China, in 2009 and MS degree from College of Computer Science and information Technology, Northeast Normal University, China in 2012. In 2015, he received the Ph.D. degree from School of Mathematics and Statistics, Northeast Normal University. Now, he is a lecturer in School of Software, Jiangxi Normal University. His research interests include dimensionality reduction and feature extraction.

    View full text