Elsevier

Neurocomputing

Volume 535, 28 May 2023, Pages 107-122
Neurocomputing

Adaptive graph regularization and self-expression for noise-aware feature selection

https://doi.org/10.1016/j.neucom.2023.03.036Get rights and content

Abstract

Many traditional unsupervised feature selection algorithms utilize manifold information to mine the local structure of the data. However, the noise existing in the raw data reduces the accuracy of the manifold information of data, which affects the learning effect of the entire algorithm. In order to solve the above problems and more fully find the internal structure inside the data, this paper proposes an adaptive graph regularization and self-expression for noise-aware feature selection (ASNFS). Firstly, the algorithm adopts non-negative matrix factorization to decompose the raw data matrix, and utilizes the low-dimensional matrix generated after decomposition to replace the raw high-dimensional data matrix. This allows the algorithm to reveal some internal structural information of the raw data while reducing the dimensionality of the data. ASNFS also introduces the orthogonal basis clustering with excellent clustering effect, and the interpretability of the algorithm is enhanced. Secondly, in addition to preserving the manifold information in the low-dimensional projection subspace, the algorithm also preserves the manifold information in the non-negative matrix factorization subspace. Meanwhile, the adaptive graph regularization term added to the objective function enables the algorithm to continuously update the similarity matrix. It can effectively remove the noise inside the raw data, and prevent the occurrence of over-fitting phenomenon of the experimental results caused by the fixed similarity matrix. Finally, the similarity matrix retains the local structure information of the data with each iteration, and the results of the feature selection are reused for the construction of the similarity matrix. ASNFS adopts an alternate iterative method to optimize the objective function, which is simple and effective. Then, the algorithm complexity and convergence are analyzed. ASNFS is compared with seven feature selection algorithms on nine datasets, and the experimental results reflect the effectiveness of ASNFS in feature selection.

Introduction

Due to the rapid growth of computer technology, large amounts of data are being generated by humans all the time. The vast majority of these data have one common characteristic in that the dimensions of these data are often very high. These high-dimensional data not only contain useful key information, but also contain a large number of useless redundant information [1]. It has a great impact on the current computer field, such as computer vision, pattern recognition [2], data mining [3], machine learning [4] and other information technologies. Therefore, reducing the dimension of high-dimensional data becomes the primary goal to solve this problem, so as to obtain effective key information in raw data to alleviate the pressure of data processing in information technology [5], [6]. Feature extraction [7] and feature selection [8] are two commonly used dimensionality reduction methods. In feature extraction, the raw data are mapped to the low-dimensional space. The number of features does not decrease, but the eigenvalues of the vector generally change [9]. PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) are two classical feature extraction methods. The difference between them is that LDA is a dimensionality reduction method of supervised learning and PCA is an unsupervised dimensionality reduction method without considering the class output. In PCA, the correlation between the features of data samples is used [10]. Thus, the data samples can be mapped to a new low-dimensional space from the raw high-dimensional space. The purpose is to identify the most discriminative new sample set. The variables with close relationship are regarded as new variables, so as to reduce the number of variables. LDA also maps data samples to a low-dimensional space. And the label information is used. The goal is to expect samples from the same cluster to be as close as possible and samples from different clusters to be as far apart [11]. Feature selection selects some features directly from the raw features, and the eigenvalues of the vectors will not change [12]. That means the features selected by feature selection and the features existing in the raw high-dimensional space are consistent. As feature extraction changes the eigenvalues of vectors in the raw data, new features not present in the raw feature set are generated. By comparison, the features obtained by feature selection are the most representative features in the raw feature set and have a more reasonable explanation for the raw data. Today, feature selection can be applied to many fields such as clustering [13], classification, retrieval, pattern recognition [14]. Feature selection methods can be classified into supervised [15], semi-supervised [16], and unsupervised feature selection methods [17] according to whether the labels are known or not and to what extent they are known. Supervised and semi-supervised methods are generally more accurate because all or part of the label information is known. Such methods require the correlation between labels and samples to judge whether the sample features are important or not [18], [19]. However, the natural data in the real life is very complex and usually unlabeled. Then, manual labeling will cost a lot of labor and time [20], so the unsupervised feature selection methods are very valuable to be studied. Unsupervised feature selection algorithms do not require any label information and evaluate the importance of features through the inherent information of the data [21], which is exactly in line with the requirements of real-world data. According to the different evaluation strategies, the feature selection algorithms can be divided into three types: filter [22], wrapper [23] and embedded [24]. Filter methods evaluate the importance of features through some statistical variables, and the process of feature selection has nothing to do with the training of subsequent models [25]. Filter methods have high computational efficiency because they do not require any learning and iterative process, but the effect is often not very ideal. Wrapper methods are to put the features into subsets of features and these subsets are evaluated. The performance of models is directly used as the evaluation criterion for the quality of subsets [26]. However, the high computational complexity leads to a large time cost. Embedded methods convert the feature selection into a model construction problem [27], and feature selection are carried out at the same time of the model training. Feature selection algorithms such as URAFS [28], MCFS [29], NDFS [30], CNAFS [31], SOGFS [32] all used the embedded methods to select the optimal subset. In contrast, the embedded methods have better performance and does not cost as much time as the wrapper methods. The process of existing embedded feature selection algorithms is roughly similar, which evaluates the importance of features according to some indicators and selects a small number of features as new datasets to complete clustering. For example, He et al. proposed the classical Laplacian score algorithm [33]. The algorithm utilized the manifold information of data, and then scored each feature separately. Using the non-negative matrix factorization framework, Cai et al. proposed graph-regularized non-negative matrix factorization (GNMF) [34], which utilized the non-negative matrix factorization framework, and it well revealed the internal structure of the raw data. Using the sparse regression framework, Cai et al. proposed muti-cluster feature selection (MCFS) [29], which utilized a sparse regression framework and used spectral analysis to preserve the local structure information of the raw data. However, the features selected by existing embedded unsupervised feature selection algorithms often lack discriminability. To enhance the discriminability of the selected features, Li et al. proposed unsupervised feature selection using non-negative spectral analysis (NDFS) [30], which combined spectral clustering and feature selection. It aimed to select the most discriminative features by using discriminative information in unsupervised scenarios. Lin et al. proposed an unsupervised feature selection algorithm based on orthogonal basis clustering and local structure preservation (OCLSP) [35], which decomposed the target matrix in the sparse regression framework and added orthogonal constraints to the decomposed matrices, so that the results were closer to the real situation. Nevertheless, the above algorithms utilized only a single sparse regression framework while using the manifold information of the data projection subspace to guide the feature selection process. Such a structure makes the algorithm inadequate for mining the structural information of the raw data.

In order to alleviate the above issues and dig deeper into the internal structure of the raw data, this paper proposes the adaptive graph regularization and self-expression for noise-aware feature selection (ASNFS). First, the algorithm decomposes the raw data matrix into basis matrix and coefficient matrix using non-negative matrix factorization, and replaces the raw high-dimensional data matrix with the decomposed low-dimensional coefficient matrix. It can well reduces the dimensionality of the raw data matrix. Besides, the decomposed basis matrix is reconstructed to complete self-expression, using the raw data. This makes the algorithm obtain more in-depth structure information about the raw data. Second, orthogonal basis clustering model is introduced on the basis of the above self-expression model. The two models guide feature selection simultaneously. In addition, the manifold information in both the raw data projection subspace and the non-negative matrix factorization subspace is retained. This can fully mine the local structure information of the data. Then, the construction of similarity matrix is added to the optimization of algorithm. The graph regularization term in the objective function constrains the process of feature selection, and the feature selection results obtained at the completion of each iteration are reused to participate in the process of constructing a new similarity matrix. As the iteration proceeds, the similarity matrix is constantly updated, which makes the algorithm achieve an adaptive effect. Adaptive updating of the similarity matrix can effectively mitigate the influence of noise in the raw data, and thus solve the problem of inaccurate retention of manifold information caused by constructing a fixed graph Laplacian matrix. Finally, we adopt a simple and effective alternate iterative optimization method to optimize the algorithm. The three main contributions of ASNFS focus on the following.

  • 1) Using the idea of self-expression, based on a non-negative matrix factorization framework, the basis matrix is reconstructed with the raw data matrix, enabling the algorithm to further mine the internal structural information of the data.

  • 2) The algorithm proposes a joint framework that introduces an orthogonal basis clustering model based on the improved non-negative matrix factorization framework, and the two work together for feature selection. Moreover, the manifold information of both the projection subspace and the non-negative matrix factorization subspace is preserved.

  • 3) The joint framework combines the adaptive graph regularization term to achieve simultaneous feature selection and manifold information learning. In addition, an approximation constraint is applied to the continuously updated similarity matrix by the initial similarity matrix to avoid the uncontrollable development of the similarity matrix.

The rest of this paper is organized as below. Section 2 is about the introduction of related work. The algorithm proposed is detailed in Section 3. In Section 4, the experimental results are shown and the algorithm performance are analyzed. Finally, the paper is summarized in Section 5.

Section snippets

Related work

Sparse regression and non-negative matrix factorization (NMF) are two model construction methods commonly used in embedded feature selection algorithms. Sparse regression is about the same as least squares regression. And the regression error of data is assumed to follow the Gaussian distribution in the least squares regression. Matrix factorization is to decompose the raw high-dimensional data matrix into two matrices [34]. The decomposed two matrices can be called basis matrix and coefficient

The proposed method

In order to improve the accuracy of the raw data similarity matrix, better preserve the local structure of the raw data, and fully explore the hidden information inside the data, this paper proposes a new unsupervised feature selection method. ASNFS utilizes a non-negative matrix factorization framework for self-expression, and an orthogonal basis clustering model is introduced on the basis of this framework. Then, the manifold information of the projection subspace of raw data and the manifold

Simulation results and analysis

ASNFS is tested on nine real-world datasets and compared with seven classical or most advanced unsupervised feature selection methods. General k-means clustering is used. Specifically, the algorithm proposed in this paper and the comparison algorithms are first experimented on the same dataset separately, and the results obtained from feature selection are used as a new dataset. The same clustering algorithm is used to cluster the raw datasets and new datasets. All experiments are implemented

Conclusions

This paper proposes a new unsupervised feature selection algorithm, which preserves the manifold information of the raw data at the same time in the projection subspace and the non-negative matrix factorization subspace. And the update of the similarity matrix is added to the optimization of the algorithm. It effectively improves the performance of ASNFS. The clustering results of ASNFS are better than those of the baseline that selects all the features. ASNFS improves on the traditional sparse

CRediT authorship contribution statement

Ronghua Shang: Conceptualization, Methodology, Writing – review & editing. Haijing Chi: Methodology, Data curation, Software, Writing – original draft. Yangyang Li: Conceptualization. Licheng Jiao: Conceptualization, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 62176200, 61773304, and 61871306, the Natural Science Basic Research Program of Shaanxi under Grant No.2022JC-45, 2022JQ-616 and the Open Research Projects of Zhejiang Lab under Grant 2021KG0AB03, the 111 Project, the National Key R&D Program of China, the Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and the GuangDong Basic and Applied Basic Research Foundation under

Ronghua Shang (SM’22) received the B.S. degree in information and computation science and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University in 2003 and 2008, respectively. She is currently a professor with Xidian University. Her current research interests include, evolutionary computation, image processing, and data mining.

References (52)

  • R. Shang et al.

    Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection

    Neurocomputing

    (2022)
  • F. Shang et al.

    Graph dual regularization non-negative matrix factorization for co-clustering

    Pattern Recogn.

    (2012)
  • Z. Li et al.

    Manifold optimization-based analysis dictionary learning with an l1/ 2-norm regularizer

    Neural Networks

    (2018)
  • J. Song et al.

    Optimized graph learning using partial tags and multiple features for image and video annotation

    IEEE Trans. Image Process.

    (2016)
  • R. Zhang et al.

    Robust principal component analysis with adaptive neighbors

    Advances in neural information processing systems

    (2019)
  • Y. Li et al.

    Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data

    Knowl.-Based Syst.

    (2016)
  • J.C. Ang et al.

    Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection

    IEEE/ACM Trans. Comput. Biol. Bioinformatics

    (2015)
  • H. Abdi et al.

    Principal component analysis

    Wiley Interdiscip. Rev.: Comput. Stat.

    (2010)
  • Y.A. Ghassabeh et al.

    Fast incremental lda feature extraction

    Pattern Recogn.

    (2015)
  • R. Zhang et al.

    Unsupervised feature selection with extended olsda via embedding nonnegative manifold structure

    IEEE Trans. Neural Networks Learn. Syst.

    (2020)
  • K. Zhan et al.

    Graph structure fusion for multiview clustering

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • J.M. Sotoca et al.

    Supervised feature selection by clustering using conditional mutual information-based distances

    Pattern Recogn.

    (2010)
  • Z. Li et al.

    Semi-supervised local feature selection for data classification

    Sci. China Inf. Sci.

    (2021)
  • Z. Li et al.

    Unsupervised feature selection via nonnegative spectral analysis and redundancy control

    IEEE Trans. Image Process.

    (2015)
  • R. Zhang et al.

    Self-weighted supervised discriminative feature selection

    IEEE Trans. Neural Networks Learn. Syst.

    (2017)
  • F. Nie, S. Xiang, Y. Jia, C. Zhang, S. Yan, Trace ratio criterion for feature selection., in: AAAI, vol. 2, 2008, pp....
  • Cited by (3)

    Ronghua Shang (SM’22) received the B.S. degree in information and computation science and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University in 2003 and 2008, respectively. She is currently a professor with Xidian University. Her current research interests include, evolutionary computation, image processing, and data mining.

    Haijing Chi received the B.E. degree in School of electronic and information engineering, Hebei University of Engineering, Handan, China, in 2020. She is now pursuing the M.S. degree in School of Artificial Intelligence, Xidian University, Xi’an, China. Her current research interests include machine learning and data mining.

    Yangyang Li (SM’18) received the B.S. and M.S. degrees in computer science and technology, and the Ph.D. degree in pattern recognition and intelligent system from Xidian University, Xi’an, China, in 2001, 2004, and 2007, respectively. She is currently a professor with the School of Artificial Intelligence, Xidian University. Her research interests include quantum-inspired evolutionary computation, artificial immune systems, and deep learning.

    Licheng Jiao (F’17) received the B.S. degree in electronic engineering from Shanghai Jiaotong University, Shanghai, China, in 1982, the M.S., and Ph.D. degrees in electronic engineering from Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990, respectively. From 1990 to 1991, he was a Postdoctoral Fellow with the National Key Laboratory for Radar Signal Processing, Xidian University, Xi’an, China. Since 1992, he has been a Professor with the School of Electronic Engineering, Xidian University. He is currently the Director with the Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University. He is in charge of about 40 important scientific research projects, and authored/coauthored more than 20 monographs and 100 papers in international journals and conferences. His research interests include image processing, natural computation, machine learning, and intelligent information processing. Dr. Jiao is a member of the IEEE Xi’an Section Execution Committee and the Chairman of awards and recognition committee, the Vice Board Chairperson of the Chinese Association of Artificial Intelligence, the Councilor of the Chinese Institute of Electronics, the Committee Member of the Chinese Committee of Neural Networks, and an expert of academic degrees committee of the state council.

    View full text