Adaptive graph regularization and self-expression for noise-aware feature selection
Introduction
Due to the rapid growth of computer technology, large amounts of data are being generated by humans all the time. The vast majority of these data have one common characteristic in that the dimensions of these data are often very high. These high-dimensional data not only contain useful key information, but also contain a large number of useless redundant information [1]. It has a great impact on the current computer field, such as computer vision, pattern recognition [2], data mining [3], machine learning [4] and other information technologies. Therefore, reducing the dimension of high-dimensional data becomes the primary goal to solve this problem, so as to obtain effective key information in raw data to alleviate the pressure of data processing in information technology [5], [6]. Feature extraction [7] and feature selection [8] are two commonly used dimensionality reduction methods. In feature extraction, the raw data are mapped to the low-dimensional space. The number of features does not decrease, but the eigenvalues of the vector generally change [9]. PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) are two classical feature extraction methods. The difference between them is that LDA is a dimensionality reduction method of supervised learning and PCA is an unsupervised dimensionality reduction method without considering the class output. In PCA, the correlation between the features of data samples is used [10]. Thus, the data samples can be mapped to a new low-dimensional space from the raw high-dimensional space. The purpose is to identify the most discriminative new sample set. The variables with close relationship are regarded as new variables, so as to reduce the number of variables. LDA also maps data samples to a low-dimensional space. And the label information is used. The goal is to expect samples from the same cluster to be as close as possible and samples from different clusters to be as far apart [11]. Feature selection selects some features directly from the raw features, and the eigenvalues of the vectors will not change [12]. That means the features selected by feature selection and the features existing in the raw high-dimensional space are consistent. As feature extraction changes the eigenvalues of vectors in the raw data, new features not present in the raw feature set are generated. By comparison, the features obtained by feature selection are the most representative features in the raw feature set and have a more reasonable explanation for the raw data. Today, feature selection can be applied to many fields such as clustering [13], classification, retrieval, pattern recognition [14]. Feature selection methods can be classified into supervised [15], semi-supervised [16], and unsupervised feature selection methods [17] according to whether the labels are known or not and to what extent they are known. Supervised and semi-supervised methods are generally more accurate because all or part of the label information is known. Such methods require the correlation between labels and samples to judge whether the sample features are important or not [18], [19]. However, the natural data in the real life is very complex and usually unlabeled. Then, manual labeling will cost a lot of labor and time [20], so the unsupervised feature selection methods are very valuable to be studied. Unsupervised feature selection algorithms do not require any label information and evaluate the importance of features through the inherent information of the data [21], which is exactly in line with the requirements of real-world data. According to the different evaluation strategies, the feature selection algorithms can be divided into three types: filter [22], wrapper [23] and embedded [24]. Filter methods evaluate the importance of features through some statistical variables, and the process of feature selection has nothing to do with the training of subsequent models [25]. Filter methods have high computational efficiency because they do not require any learning and iterative process, but the effect is often not very ideal. Wrapper methods are to put the features into subsets of features and these subsets are evaluated. The performance of models is directly used as the evaluation criterion for the quality of subsets [26]. However, the high computational complexity leads to a large time cost. Embedded methods convert the feature selection into a model construction problem [27], and feature selection are carried out at the same time of the model training. Feature selection algorithms such as URAFS [28], MCFS [29], NDFS [30], CNAFS [31], SOGFS [32] all used the embedded methods to select the optimal subset. In contrast, the embedded methods have better performance and does not cost as much time as the wrapper methods. The process of existing embedded feature selection algorithms is roughly similar, which evaluates the importance of features according to some indicators and selects a small number of features as new datasets to complete clustering. For example, He et al. proposed the classical Laplacian score algorithm [33]. The algorithm utilized the manifold information of data, and then scored each feature separately. Using the non-negative matrix factorization framework, Cai et al. proposed graph-regularized non-negative matrix factorization (GNMF) [34], which utilized the non-negative matrix factorization framework, and it well revealed the internal structure of the raw data. Using the sparse regression framework, Cai et al. proposed muti-cluster feature selection (MCFS) [29], which utilized a sparse regression framework and used spectral analysis to preserve the local structure information of the raw data. However, the features selected by existing embedded unsupervised feature selection algorithms often lack discriminability. To enhance the discriminability of the selected features, Li et al. proposed unsupervised feature selection using non-negative spectral analysis (NDFS) [30], which combined spectral clustering and feature selection. It aimed to select the most discriminative features by using discriminative information in unsupervised scenarios. Lin et al. proposed an unsupervised feature selection algorithm based on orthogonal basis clustering and local structure preservation (OCLSP) [35], which decomposed the target matrix in the sparse regression framework and added orthogonal constraints to the decomposed matrices, so that the results were closer to the real situation. Nevertheless, the above algorithms utilized only a single sparse regression framework while using the manifold information of the data projection subspace to guide the feature selection process. Such a structure makes the algorithm inadequate for mining the structural information of the raw data.
In order to alleviate the above issues and dig deeper into the internal structure of the raw data, this paper proposes the adaptive graph regularization and self-expression for noise-aware feature selection (ASNFS). First, the algorithm decomposes the raw data matrix into basis matrix and coefficient matrix using non-negative matrix factorization, and replaces the raw high-dimensional data matrix with the decomposed low-dimensional coefficient matrix. It can well reduces the dimensionality of the raw data matrix. Besides, the decomposed basis matrix is reconstructed to complete self-expression, using the raw data. This makes the algorithm obtain more in-depth structure information about the raw data. Second, orthogonal basis clustering model is introduced on the basis of the above self-expression model. The two models guide feature selection simultaneously. In addition, the manifold information in both the raw data projection subspace and the non-negative matrix factorization subspace is retained. This can fully mine the local structure information of the data. Then, the construction of similarity matrix is added to the optimization of algorithm. The graph regularization term in the objective function constrains the process of feature selection, and the feature selection results obtained at the completion of each iteration are reused to participate in the process of constructing a new similarity matrix. As the iteration proceeds, the similarity matrix is constantly updated, which makes the algorithm achieve an adaptive effect. Adaptive updating of the similarity matrix can effectively mitigate the influence of noise in the raw data, and thus solve the problem of inaccurate retention of manifold information caused by constructing a fixed graph Laplacian matrix. Finally, we adopt a simple and effective alternate iterative optimization method to optimize the algorithm. The three main contributions of ASNFS focus on the following.
1) Using the idea of self-expression, based on a non-negative matrix factorization framework, the basis matrix is reconstructed with the raw data matrix, enabling the algorithm to further mine the internal structural information of the data.
2) The algorithm proposes a joint framework that introduces an orthogonal basis clustering model based on the improved non-negative matrix factorization framework, and the two work together for feature selection. Moreover, the manifold information of both the projection subspace and the non-negative matrix factorization subspace is preserved.
3) The joint framework combines the adaptive graph regularization term to achieve simultaneous feature selection and manifold information learning. In addition, an approximation constraint is applied to the continuously updated similarity matrix by the initial similarity matrix to avoid the uncontrollable development of the similarity matrix.
The rest of this paper is organized as below. Section 2 is about the introduction of related work. The algorithm proposed is detailed in Section 3. In Section 4, the experimental results are shown and the algorithm performance are analyzed. Finally, the paper is summarized in Section 5.
Section snippets
Related work
Sparse regression and non-negative matrix factorization (NMF) are two model construction methods commonly used in embedded feature selection algorithms. Sparse regression is about the same as least squares regression. And the regression error of data is assumed to follow the Gaussian distribution in the least squares regression. Matrix factorization is to decompose the raw high-dimensional data matrix into two matrices [34]. The decomposed two matrices can be called basis matrix and coefficient
The proposed method
In order to improve the accuracy of the raw data similarity matrix, better preserve the local structure of the raw data, and fully explore the hidden information inside the data, this paper proposes a new unsupervised feature selection method. ASNFS utilizes a non-negative matrix factorization framework for self-expression, and an orthogonal basis clustering model is introduced on the basis of this framework. Then, the manifold information of the projection subspace of raw data and the manifold
Simulation results and analysis
ASNFS is tested on nine real-world datasets and compared with seven classical or most advanced unsupervised feature selection methods. General -means clustering is used. Specifically, the algorithm proposed in this paper and the comparison algorithms are first experimented on the same dataset separately, and the results obtained from feature selection are used as a new dataset. The same clustering algorithm is used to cluster the raw datasets and new datasets. All experiments are implemented
Conclusions
This paper proposes a new unsupervised feature selection algorithm, which preserves the manifold information of the raw data at the same time in the projection subspace and the non-negative matrix factorization subspace. And the update of the similarity matrix is added to the optimization of the algorithm. It effectively improves the performance of ASNFS. The clustering results of ASNFS are better than those of the baseline that selects all the features. ASNFS improves on the traditional sparse
CRediT authorship contribution statement
Ronghua Shang: Conceptualization, Methodology, Writing – review & editing. Haijing Chi: Methodology, Data curation, Software, Writing – original draft. Yangyang Li: Conceptualization. Licheng Jiao: Conceptualization, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 62176200, 61773304, and 61871306, the Natural Science Basic Research Program of Shaanxi under Grant No.2022JC-45, 2022JQ-616 and the Open Research Projects of Zhejiang Lab under Grant 2021KG0AB03, the 111 Project, the National Key R&D Program of China, the Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and the GuangDong Basic and Applied Basic Research Foundation under
Ronghua Shang (SM’22) received the B.S. degree in information and computation science and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University in 2003 and 2008, respectively. She is currently a professor with Xidian University. Her current research interests include, evolutionary computation, image processing, and data mining.
References (52)
- et al.
Locality and similarity preserving embedding for feature selection
Neurocomputing
(2014) - et al.
Selection of relevant features and examples in machine learning
Artif. Intell.
(1997) - et al.
Accelerating wrapper-based feature selection with k-nearest-neighbor
Knowl.-Based Syst.
(2015) - et al.
Incremental feature extraction based on decision boundaries
Pattern Recogn.
(2018) - et al.
A feature extraction model based on discriminative graph signals
Expert Syst. Appl.
(2020) - et al.
Face recognition via weighted sparse representation
J. Vis. Commun. Image Represent.
(2013) - et al.
A wrapper method for feature selection using support vector machines
Inf. Sci.
(2009) - et al.
Wrappers for feature subset selection
Artif. Intell.
(1997) - et al.
Subspace learning for unsupervised feature selection via matrix factorization
Pattern Recogn.
(2015) - et al.
Subspace learning for unsupervised feature selection via matrix factorization
Pattern Recogn.
(2015)
Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection
Neurocomputing
Graph dual regularization non-negative matrix factorization for co-clustering
Pattern Recogn.
Manifold optimization-based analysis dictionary learning with an l1/ 2-norm regularizer
Neural Networks
Optimized graph learning using partial tags and multiple features for image and video annotation
IEEE Trans. Image Process.
Robust principal component analysis with adaptive neighbors
Advances in neural information processing systems
Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data
Knowl.-Based Syst.
Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection
IEEE/ACM Trans. Comput. Biol. Bioinformatics
Principal component analysis
Wiley Interdiscip. Rev.: Comput. Stat.
Fast incremental lda feature extraction
Pattern Recogn.
Unsupervised feature selection with extended olsda via embedding nonnegative manifold structure
IEEE Trans. Neural Networks Learn. Syst.
Graph structure fusion for multiview clustering
IEEE Trans. Knowl. Data Eng.
Supervised feature selection by clustering using conditional mutual information-based distances
Pattern Recogn.
Semi-supervised local feature selection for data classification
Sci. China Inf. Sci.
Unsupervised feature selection via nonnegative spectral analysis and redundancy control
IEEE Trans. Image Process.
Self-weighted supervised discriminative feature selection
IEEE Trans. Neural Networks Learn. Syst.
Cited by (3)
Multi-label Feature selection with adaptive graph learning and label information enhancement
2024, Knowledge-Based SystemsFeature selection based on a multi-strategy African vulture optimization algorithm and its application in essay scoring
2023, Journal of Intelligent and Fuzzy Systems
Ronghua Shang (SM’22) received the B.S. degree in information and computation science and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University in 2003 and 2008, respectively. She is currently a professor with Xidian University. Her current research interests include, evolutionary computation, image processing, and data mining.
Haijing Chi received the B.E. degree in School of electronic and information engineering, Hebei University of Engineering, Handan, China, in 2020. She is now pursuing the M.S. degree in School of Artificial Intelligence, Xidian University, Xi’an, China. Her current research interests include machine learning and data mining.
Yangyang Li (SM’18) received the B.S. and M.S. degrees in computer science and technology, and the Ph.D. degree in pattern recognition and intelligent system from Xidian University, Xi’an, China, in 2001, 2004, and 2007, respectively. She is currently a professor with the School of Artificial Intelligence, Xidian University. Her research interests include quantum-inspired evolutionary computation, artificial immune systems, and deep learning.
Licheng Jiao (F’17) received the B.S. degree in electronic engineering from Shanghai Jiaotong University, Shanghai, China, in 1982, the M.S., and Ph.D. degrees in electronic engineering from Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990, respectively. From 1990 to 1991, he was a Postdoctoral Fellow with the National Key Laboratory for Radar Signal Processing, Xidian University, Xi’an, China. Since 1992, he has been a Professor with the School of Electronic Engineering, Xidian University. He is currently the Director with the Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University. He is in charge of about 40 important scientific research projects, and authored/coauthored more than 20 monographs and 100 papers in international journals and conferences. His research interests include image processing, natural computation, machine learning, and intelligent information processing. Dr. Jiao is a member of the IEEE Xi’an Section Execution Committee and the Chairman of awards and recognition committee, the Vice Board Chairperson of the Chinese Association of Artificial Intelligence, the Councilor of the Chinese Institute of Electronics, the Committee Member of the Chinese Committee of Neural Networks, and an expert of academic degrees committee of the state council.