Unsupervised feature selection with adaptive multiple graph learning

doi:10.1016/j.patcog.2020.107375

Pattern Recognition

Volume 105, September 2020, 107375

https://doi.org/10.1016/j.patcog.2020.107375 Get rights and content

Highlights

•
A framework of jointly multiple graph learning and feature selection is proposed.
•
An effective algorithm is proposed for optimizing the objective function.
•
The experimental results show that our algorithm outperforms the state-of-the-art methods.

Abstract

Unsupervised feature selection methods try to select features which can well preserve the intrinsic structure of data. To represent such structure, conventional methods construct various graphs from data. In most cases, those different graphs often contain some consensus and complementary information. To make full use of such information, we construct multiple base graphs and learn an adaptive consensus graph from these base graphs for feature selection. In our method, we integrate the multiple graph learning and the feature selection into a unified framework, which can jointly characterize the structure of the data and select the features to preserve such structure. The underlying optimization problem is hard to solve, and we solve it via a block coordinate descent schema, whose convergence is guaranteed. The extensive experiments well demonstrate the effectiveness of our proposed framework.

Introduction

Feature selection is a fundamental problem in machine learning and has attracted considerable attention in the past decades [1], [2], [3], [4], [5]. In many real-world applications, the data contain a large number of features, which may cause the curse of dimensionality. Moreover, some features are often contaminated by noises, which may deteriorate the performance of machine learning methods. To address these problems, feature selection is applied to select a small number of informative and non-redundant features for the machine learning methods. Since feature selection usually leads to better learning performance, it has been successfully applied in many real applications such as text categorization [6], [7], image processing [8], and bioinformatics [9]. According to the availability of the labels, these methods can be categorized into supervised [10], [11], semi-supervised [12] and unsupervised algorithms [5], [13]. Due to the absence of the label information, the unsupervised feature selection is a more challenging problem.

Since the label information is absent in unsupervised learning, feature selection method can only make use of the information of data itself, or equivalently speaking, the intrinsic structure of data [14], [15], [16], [17], [18]. Conventional methods often construct a graph from the data to represent such structure. According to the way of constructing the graph, the existing methods can be roughly categorized into two classes: (1) using a pre-defined graph, which often pre-defines a graph and selects features to preserve such graph structure [19], [20], [21]; (2) leaning an adaptive graph, which learns an adaptive graph simultaneously in the process of feature selection [5], [17], [22].

Note that both the two classes only use one single graph (either a pre-defined graph or an adaptive graph) to represent the structure of data. However, in real applications, the structure may be too complex to be captured by one single graph. Given a data set, various graphs can be constructed based on different distance metrics, such as Euclidean distance and cosine similarity. In most cases, these different graphs contain some consensus and complementary information. Those conventional methods which only use one single graph may ignore such useful information. Moreover, some data are naturally performed in multiple graph structures. For example, relationships of research papers contain several graphs such as co-author graph and citation graph. When handling these data, the aforementioned methods may fail to fully utilize the given graphs.

To address these problems, in this paper, we characterize the intrinsic structure with adaptive multiple graph learning. Multiple graph learning tries to learn a consensus graph from multiple base graphs. For example, Nie et al. [23] proposed a parameter-free multiple graph learning method which learned the weight of each graph automatically; Zhan et al. [24] learned a consensus graph with minimizing disagreement between different views and constraining the rank of the Laplacian matrix. In our framework, firstly, if the data contains multiple graphs, we directly use them as base graphs; if otherwise, we construct some pre-defined graphs from data. Then we learn the consensus graph from these base graphs simultaneously in the process of feature selection. On the one hand, we use the result of feature selection to guide the multiple graph learning, and on the other hand, we apply the learned graph to select the informative features. When learning the consensus graph, since the scale of the graphs may vary dramatically, we first normalize all base graph adjacency matrices as transition matrices and learn a consensus transition matrix from them. Since the transition matrix has a clear probabilistic interpretation, we use the Kullback-Leibler divergence for consensus measuring. When selecting the features, we impose a weight on each feature and try to transform the weighted data into a subspace which can well preserve the intrinsic structure represented by the consensus graph. This procedure repeats until convergence.

To integrate the multiple graph learning and feature selection into a unified framework as introduced before, we carefully design a non-convex objective function. To optimize it, we propose a block coordinate descent algorithm and prove its convergence. We conduct extensive experiments on benchmark data sets by comparing our algorithm with several state-of-the-art unsupervised feature selection methods, and the experimental results show that ours outperforms these state-of-the-art methods.

The paper is organized as follows. Section 2 describes some related work. Section 3 presents in detail the main algorithm of our method. Section 4 shows the experimental results, and Section 5 concludes the paper.

Section snippets

Related work

To handle high dimensional data, many feature learning methods are proposed. One kind of feature learning method is feature extraction [25], [26]. Feature extraction learns a projection to map the data from high dimensional feature space to a low dimensional space. For example, in [27], a multilinear principal component analysis was proposed to project the original image data to a low dimensional space; a sparse discriminant projection method was provided in [28]; most recently, Lai et al. [29]

Feature selection with multiple graphs

In this section, we introduce our feature selection method with adaptive multiple graph learning. Firstly, we introduce some notations in this paper. We use a bold uppercase character to denote a matrix and a bold lowercase character to denote a vector. For an arbitrary matrix $M \in R^{r \times s},$ M_i. denotes its ith row, M_.i denotes its ith column, and M_ij denotes its (i, j)th element.

Experiments

In this section, we compare our method with several state-of-the-art unsupervised feature selection methods on benchmark data sets.

Conclusion and future work

In this paper, we proposed a feature selection method with adaptive multiple graph learning. We made use of multiple graphs to learn an adaptive consensus graph to characterize the intrinsic structure of the data. To boost the structure learning and feature selection, we integrated them into a unified framework. We then presented a block coordinate descent method whose convergence is guaranteed to optimize the introduced objective function. Experimental results demonstrated that our method

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Natural Science Fund of China grants 61806003, 61976129, and 61972001; the Key Natural Science Project of Anhui Provincial Education Department KJ2018A0010; and the National Natural Science Foundation of Shanxi grant 201801D221163.

Peng Zhou received the B.E. degree in computer science from University of Science and Technology of China in 2011 and Ph.D. degree in computer science from the Institute of Software, Chinese Academy of Sciences in 2017. He is currently a lecturer in Anhui University. His research interests include machine learning and data mining.

References (54)

C. Tang et al.
Consensus learning guided multi-view unsupervised feature selection
Knowl. Based Syst.
(2018)
Y. Rui et al.
Image retrieval: current techniques, promising directions, and open issues
J. Vis. Commun. Image Represent.
(1999)
P. Zhu et al.
Co-regularized unsupervised feature selection
Neurocomputing
(2018)
W. Zheng et al.
Low-rank structure preserving for unsupervised feature selection
Neurocomputing
(2018)
P. Zhu et al.
Unsupervised feature selection by regularized self-representation
Pattern Recognit.
(2015)
P. Zhou et al.
Unsupervised feature selection for balanced clustering
Knowl. Based Syst.
(2020)
Z. Zhang et al.
Hypergraph based information-theoretic feature selection
Pattern Recognit. Lett.
(2012)
Z. Zhang et al.
Joint hypergraph learning and sparse regression for feature selection
Pattern Recognit.
(2017)
P. Zhu et al.
Subspace clustering guided unsupervised feature selection
Pattern Recognit.
(2017)
X. Du et al.
Multiple graph unsupervised feature selection
Signal Process.
(2016)

Z. Wang et al.

Adaptive multi-view feature selection for human motion retrieval

Signal Process.

(2016)

R. Zhang et al.

Feature selection with multi-view data: a survey

Inf. Fusion

(2019)

P. Zhou et al.

Incremental multi-view support vector machine

Proceedings of the 2019 SIAM International Conference on Data Mining

(2019)

P. Zhou et al.

Incremental multi-view spectral clustering

Knowl. Based Syst.

(2019)

Z. Hong et al.

Optimal discriminant plane for a small number of samples and design method of classifier on the plane

Pattern Recognit.

(1991)

F. Nie et al.

Efficient and robust feature selection via joint ℓ_2,1-norms minimization

Advances in Neural Information Processing Systems

(2010)

C. Hou et al.

Multi-view unsupervised feature selection with adaptive similarity and view weight

IEEE TKDE

(2017)

X. Zhu et al.

Local and global structure preservation for robust unsupervised spectral feature selection

IEEE TKDE

(2018)

M. Luo et al.

Adaptive unsupervised feature selection with structure regularization

IEEE TNNLS

(2018)

Y. Yang et al.

A comparative study on feature selection in text categorization

ICML

(1997)

K. Nigam et al.

Text classification from labeled and unlabeled documents using em

Mach. Learn.

(2000)

Y. Saeys et al.

A review of feature selection techniques in bioinformatics

Bioinformatics

(2007)

R. Tibshirani

Regression shrinkage and selection via the lasso

J. R. Stat. Soc. Series B

(1996)

M. Fan et al.

Top-k supervise feature selection via admm for integer programming

IJCAI

(2017)

Z. Xu et al.

Discriminative semi-supervised feature selection via manifold regularization

IEEE Trans. Neural Netw.

(2010)

J.G. Dy et al.

Feature selection for unsupervised learning

J. Mach. Learn. Res.

(2004)

X. He et al.

Laplacian score for feature selection

Advances in Neural Information Processing Systems

(2006)

Cited by (0)

Liang Du received the B.E. degree in software engineering from Wuhan University in 2007, and Ph.D. degree in computer science from the Institute of Software, Chinese Academy of Sciences in 2013. He is currently a lecturer in ShanXi University. His research interests include machine learning, data mining and big data analysis.

Xuejun Li received the Ph.D. degree from Anhui University in 2008. He is currently a professor of School of Computer Science & Technology, Anhui University, China. His major research interests include intelligent software, cloud computing, and workflow systems.

Yi-Dong Shen is a professor of computer science in the State Key Laboratory of Computer Science at the Institute of Software, the Chinese Academy of Sciences. His main research interests include knowledge representation and reasoning, semantic web, and data mining.

Yuhua Qian received the M.S. and Ph.D. degrees from Shanxi University, Taiyuan, China, in 2005 and 2011, respectively. He is currently a Professor with the Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University. He has authored more than 70 papers in his research fields.

View full text

Unsupervised feature selection with adaptive multiple graph learning

Highlights

Abstract

Introduction

Section snippets

Related work

Feature selection with multiple graphs

Experiments

Conclusion and future work

Declaration of Competing Interest

Acknowledgments

Knowl. Based Syst.

J. Vis. Commun. Image Represent.

Neurocomputing

Neurocomputing

Pattern Recognit.

Knowl. Based Syst.

Pattern Recognit. Lett.

Pattern Recognit.

Pattern Recognit.

Signal Process.

Signal Process.

Inf. Fusion

Knowl. Based Syst.

Pattern Recognit.

Efficient and robust feature selection via joint ℓ2,1-norms minimization

Advances in Neural Information Processing Systems

Multi-view unsupervised feature selection with adaptive similarity and view weight

IEEE TKDE

Local and global structure preservation for robust unsupervised spectral feature selection

IEEE TKDE

Adaptive unsupervised feature selection with structure regularization

IEEE TNNLS

A comparative study on feature selection in text categorization

ICML

Text classification from labeled and unlabeled documents using em

Mach. Learn.

A review of feature selection techniques in bioinformatics

Bioinformatics

Regression shrinkage and selection via the lasso

J. R. Stat. Soc. Series B

Top-k supervise feature selection via admm for integer programming

IJCAI

Discriminative semi-supervised feature selection via manifold regularization

IEEE Trans. Neural Netw.

Feature selection for unsupervised learning

J. Mach. Learn. Res.

Laplacian score for feature selection

Advances in Neural Information Processing Systems

Efficient and robust feature selection via joint ℓ_2,1-norms minimization