Joint feature weighting and adaptive graph-based matrix regression for image supervised feature Selection

https://doi.org/10.1016/j.image.2020.116044Get rights and content

Highlights

  • Take image data as input in regression model to keep the spatial relations of elements in data.

  • Use the learned feature weight matrix to select important features from image.

  • Adaptively learn graph matrix to reduce the influence of noises and preserve the local structure of samples.

  • Design an iterative optimization algorithm and analyze its complexity.

  • Verify the superiority of the proposed method on several datasets.

Abstract

Matrix regression (MR) is a regression model that can directly perform on matrix data. However, the effect of each element in matrix data on regression model is different. Taking into consideration the relevance of every original feature in the matrix data and their influence on the final estimation of the regression model, we introduce an unknown weight matrix to encode the relevance of feature in matrix data and propose a feature weighting and graph-based matrix regression (FWGMR) model for image supervised feature selection. In this model, the feature weight matrix is used to select some important features from the matrix data and preserve the relative spatial location relationship of elements in the matrix data. In addition, in order to effectively and reasonably preserve the local manifold structure of the training matrix samples, a regularization term in the model is used to adaptively learn a graph matrix on low-dimensional space. An optimization algorithm is devised to solve FWGMR model and to provide the closed-form solutions of this model in each iteration. Extensive experiments on some public datasets demonstrate the superiority of FWGMR.

Introduction

Dimensionality reduction (DR) is an important technique to reduce the dimensionality of the high-dimensional data by finding relevant low-dimensional features. There are two kinds of essentially different DR techniques. Feature selection (or feature ranking) [1], [2] aims at selecting a subset of relevant features from the high-dimensional data to represent the original data. Feature extraction (or subspace learning) [3], [4] obtains new low-dimensional features by learning a transformation on the high-dimensional data. Preprocessing data in this way not only decreases the processing time but also leads to more compactness and better generalization of the learned model [5], [6]. Since feature selection does not change the original representations and maintains the physical meaning of data variables, we focus on the feature selection here.

According to the availability of label information, feature selection methods are divided into supervised, semi-supervised and unsupervised learning. For supervised learning case, it commonly needs to measure the relationship between features and labels. Most existing supervised feature selection methods are vector-based, such as Fisher Score [7] based on linear discriminative analysis (LDA), robust feature selection (RFS) [8] and global and local structure preservation feature selection GLSPFS [9]. In recent years, many sparse regression-based feature selection methods have also emerged. Robust feature selection (RFS) [8] method imposes jointly l2,1-norm on both loss function and regularization. Liu et al. [10] used l2,1-norm instead of l1-norm as the penalty and proposed a multi-task feature selection method. Cai et al. [11] gave a feature selection approach, which has the l2,1-norm loss function with an explicit l2,0-norm equality constraint. Xiang et al. [12] provided a framework of discriminative least squares regression for feature selection. He et al. [13] researched the robust feature extraction based on l2,1 regularized correntropy. For the semi-supervised learning, Wang et al. [14] combined the shared subspace learning with the good effect in the multi-label learning scenario, and utilized the manifold learning of the underlying geometric structure of the training data, as well as the labeled data and unlabeled data. In addition, the underlying manifold structure is guaranteed to be clear by using the l1-norm regularization. For unsupervised cases, Wang et al. [15] proposed a fast adaptive k-means (FAKM) subspace clustering model, designed an adaptive loss function and provided a flexible clustering index calculation mechanism, which is applicable to data sets under different distributions. Here, we mainly discuss the supervised learning method.

In data mining, it is well known that feature selection or weighting can avoid, on one hand, over-fitting problems while improving classification performance, and on the other hand, can provide efficient and more cost-effective learning models [16]. Xu et al. [17] proposed a data clustering algorithm where data have multiple views and each sample has multiple feature vectors. By this clustering algorithm, the weights of views and those of the features can be simultaneously estimated. Zhu et al. [18] took into account the relevance of the original features and provided a joint graph-based embedding and feature weighting (JEFW) for getting a flexible and inductive nonlinear data representation on manifolds. By introducing a feature weighting scheme in the Gaussian kernel and using l0-norm as sparseness to promote regularization, Ouiza et al. [19] proposed a polynomial kernel logistic regression method with embedded feature correlation. In addition, Ouiza et al. [20] also constructed the kernel sparsity and group sparsity models, and proposed a polynomial kernel logistic regression based on the correlation of feature groups. This method encodes the sparsity of the group by reflecting the recognition ability of each group between different interaction categories by associating weights.

When the original data is matrix data, it is first reshaped into a vector in traditional researches. Nevertheless, this vectorization of the matrix data may cause some problems. First, vectorized data is often high dimensional, which makes the vector-based methods often suffer from small sample size problem (SSS). Second, vectorization will also ignore the spatial location information of elements in the original matrix data and destroy the relative geometric relationship between them. Third, when the image contains some noises (such as the block-wise noisy occlusion), we should treat this occlusion in a whole part. However, this correlation will be lost when they are treated as flatting or vertical vectors. Thus, it is necessary to investigate the problem of dimensionality reduction for matrix data directly [21].

Similar to vector-based DR approaches, there should be two different ways for matrix-based DR: feature extraction and feature selection. Up to now, some feature extraction methods based on matrix data have been developed, such as 2DPCA (2D Principal Component Analysis) [22], (2D)2PCA [23], 2DLDA (2D Linear Discriminant Analysis) [24] and (2D)2LDA [25]. Besides, many feature extraction algorithms based on the tensor subspace method have also been developed for data representation, pattern classification and network abnormal detection [26].

To directly select the features on matrix data, Hou et al. [27] recently proposed an algorithm named sparse matrix regression (SMR) for two-dimensional supervised feature selection, in which the relationship between matrix data and the class labels is measured by deploying left and right regression matrices. Thereafter, Yuan et al. [28] also proposed a joint sparse matrix regression and nonnegative spectral analysis (JSMRNS) model for two-dimensional unsupervised feature selection, where the true class labels in SMR are replaced with the pseudo class labels obtained by a nonnegative spectral clustering method. Chen and Lu [29] used the L2,1-norm-based the loss function to reduce the influence of the outliers or noises and construct a robust graph regularized sparse matrix regression (GRSMR) model for image data supervised feature selection. In these methods, the feature selection on the matrix data is fulfilled by using the row sparsity of the matrix P integrated by the left and right regression matrices. By computing the l2-norm of all rows of P and sorting them in descending order, we can select top-ranked s maximum values. Then, the elements in the vectorization of the matrix data corresponding to the first s largest values are the selected features. However, these methods still need to vectorize the matrix data when making feature selection and do not consider the relevance of elements in data matrix.

For taking into consideration the relevance of every original feature in the image and improving their influence on the final estimation of the regression model, we will introduce an unknown weight matrix, denoted by S, in which every element should encode the relevance of a feature in matrix. Thus, we propose a feature weighting and graph-based matrix regression (FWGMR) model for image data supervised feature selection. In this model, we use the feature weight matrix to select directly some importance features from the matrix data, which can make the matrix regression sparse to a certain extent while preserving the relative spatial location relationship of features in the image. In addition, in order to preserve the local manifold structure of the training samples, that is, the samples sharing the same labels should be kept close together in the transformed space, we use a regularization term to adaptively learn a graph weight matrix. Finally, an optimization algorithm is devised to solve FWGMR, and to give closed-form solutions in each iteration so that this algorithm can be implemented easily in real applications. Several extensive experiments on some public datasets demonstrate the superiority of FWGMR.

The remainder of this paper is organized as follows. In Section 2, we introduce some notations and review some related models. In Section 3, we will propose a feature weighting and adaptive graph matrix regression (FWGMR) model for image data supervised feature selection, and devise its solving algorithm and provide algorithmic convergence analyses. In Section 4, we report some experiments on multiclass classification. Conclusions are drawn in Section 5.

Section snippets

Notations and related works

Given the set of matrix samples {XiRm×n:i=1,2,,N}, where m and n are the first and second dimensions of each matrix data respectively, and N is the number of matrix data samples. Suppose these N matrix samples are from c classes. The associated class label vectors are {y1,y2,,yN}Rc, where yi=[y1i,y2i,,yci]T{0,1}c is the class indicator vector of Xi, that is, yji=1 if and only if Xi belongs to the jth class and yji=0 otherwise. Denote column vector of all ones as 1=[1,1,,1]TRN, s×s

Proposed method

In this section, we will introduce the proposed model and its corresponding algorithm, which allows the estimation of an inductive non-linear embedding.

Experiments and analysis

In this section, we compare our FWGMR algorithm with some other existing feature selection approaches on several benchmark datasets to test and verify its effectiveness. These methods include Fisher Score [7] based on linear discriminative analysis (LDA), robust feature selection (RFS) [8], global and local structure preservation feature selection GLSPFS [9], joint graph-based embedding and feature weighting (JEFW) [18], sparse matrix regression (SMR) [27], a joint sparse matrix regression and

Conclusions

In this paper, we propose a joint feature weighting and matrix regression (FWGMR) for image supervised feature selection. Compared with the traditional vector-based supervised feature selection method, FWGMR can directly take the image data as the input data and select the distinguishing features of the matrix data via the weight matrix. Therefore, the proposed method considers the location information and relevance of elements in the original image data. In addition, we also adaptively learn

CRediT authorship contribution statement

Yun Lu: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft. Xiuhong Chen: Conceptualization, Methodology, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (42)

  • ZhangZ. et al.

    Marginal representation learning with graph structure self-adaptation

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • ChenM. et al.

    A unified feature selection framework for graph embedding on high dimensional data

    IEEE Trans. Knowl. Data Eng.

    (2015)
  • GutonI. et al.

    An introduction to variable and feature selection

    J. Mach. Learn. Res.

    (2003)
  • StorkD.G. et al.

    Pattern Classification

    (2000)
  • F. Nie, H. Huang, X. Cai, C. Ding, Efficient and robust feature selection via joint l2,1-norms minimization, in: Proc....
  • LiuX. et al.

    Global and local structure preservation for feature selection

    IEEE Trans. Neural Netw. Learn. Syst.

    (2013)
  • J. Liu, S. Ji, J. Ye, Multi-task feature learning via efficient l2,1-norm minimization, in: Proc. 21th Conf....
  • X. Cai, F. Nie, H. Huang, Exact top-k feature selection via l2,0-norm constraint, in: Proceedings of the 23th Int....
  • XiangS.M. et al.

    Discriminative least squares regressions for multiclass classification and feature selection

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • HeR. et al.

    l2,1REgularized correntropy for robust feature selection

  • WangX.D. et al.

    Fast adaptive k-means subspace clustering for high-dimensional data

    IEEE Access

    (2019)
  • Cited by (2)

    • Hyper-class representation of data

      2022, Neurocomputing
      Citation Excerpt :

      Finally, it selects the appropriate feature subset from the obtained pareto solution. Lu and Chen proposed a supervised feature selection algorithm by combining feature weight and graph matrix regression [28]. On the one hand, it encodes the correlation between features through the weight matrix.

    View full text