Unsupervised feature selection via discrete spectral clustering and feature weights
Introduction
With the rapid development of information technology, a large amount of high-dimensional data has been generated in various fields. These high-dimensional data always contain some noise and redundant features, which increase the difficulty of data processing [1]. The research in the fields of machine learning, image processing, data mining [2], and pattern recognition [3] is used to process the high-dimensional data. It is necessary to reduce the dimensionality of high-dimensional data [4]. Dimension reduction can not only decrease the time cost of calculation and improve calculation efficiency, but also reduce the space pressure of calculation. Feature extraction [5] and feature selection [6] are two common dimensionality reduction methods. Feature extraction converts all features into fewer new features to replace the original features [7]. Feature selection selects some representative features to form a subset according to a certain standard, to obtain a compressed data representation [8]. In addition, it can retain the semantic information of data [9] and has stronger interpretability [10]. Nowadays, feature selection is constantly developing [11]. However, most of the high-dimensional data in real life are unlabeled, which is difficult for direct feature selection. Therefore, how to obtain pseudo-labels that are closer to the real labels is a challenging research. This paper proposes an unsupervised feature selection method via discrete spectral clustering and feature weights (FSDSC), which uses a discrete clustering indicator matrix as a pseudo-label to provide clearer discriminative information for feature selection. At the same time, the feature subset is extracted based on the feature weight matrix. Compared with traditional methods that need to impose constraints on the feature selection matrix, such as sparse regularization item, the feature weight matrix reduces the complexity of model and reduces the calculation amount of feature evaluation. Specifically, FSDSC integrates regression models and spectral clustering in a unified framework, introduces a feature weight matrix in the framework, and then performs feature selection through this framework. The matrix is a diagonal matrix, each diagonal element of which intuitively represents the weight of each feature, so that the algorithm can easily select a subset of features. Orthogonal constraint is imposed on the transformation matrix. Compared with least square regression, orthogonal regression model can preserve more discriminative information and avoid trivial solutions. In addition, FSDSC improves the spectral clustering method to obtain a discrete indicator matrix, which provides more accurate guidance information for feature selection. The rest of this paper is organized as follows. Section 2 is mainly about the introduction of some related work. Section 3 introduces the proposed model, optimization method and convergence analysis in detail. Section 4 presents the experimental results and comparative analysis of FSDSC and the compared algorithms on the same datasets. The conclusion is summarized in Section 5.
Section snippets
Related work
At present, feature selection methods can be divided into supervised feature selection, semi-supervised feature selection, and unsupervised feature selection according to whether label information is needed [12]. Supervised methods use the correlation between sample and label information to select discriminative features, which are beneficial to the classification of samples [13]. Some labels are needed in the semi-supervised methods [14]. Most of these methods combine labeled and unlabeled
The proposed method
In this section, FSDSC in detail is introduced, which is mainly composed of two parts: the orthogonal regression model with feature weights and discrete spectral clustering. In the proposed algorithm, the dataset is represented by a matrix , where n represents the number of samples, and d represents the dimension of samples, that is, the number of features. In addition, let c be the number of categories, l be the number of selected features, and .
Simulation results and the analyses
In this section, the experiments are conducted to verify the effectiveness of FSDSC. Specifically, first, feature selection is performed on the same datasets through FSDSC and the compared algorithms, the selected features are recombined into a new dataset, and then -means method [55] is used to cluster the new dataset. The performance of FSDSC is evaluated by analyzing the clustering effect. In addition, the parameter sensitivity analysis and convergence study are conducted. Before showing
Conclusions
This paper proposes unsupervised feature selection via discrete spectral clustering and feature weights (FSDSC), which combines regression models and spectral clustering to form a unified feature selection framework. On this basis, FSDSC introduces a feature weight matrix to express the importance of features, which simplifies the process of feature selection. Second, FSDSC obtains a discrete clustering indicator matrix by imposing discrete constraint on spectral clustering, thereby providing
CRediT authorship contribution statement
Ronghua Shang: Conceptualization, Methodology, Writing – review & editing. Jiarui Kong: Methodology, Data curation, Software. Lujuan Wang: Methodology, Data curation, Writing – original draft. Weitong Zhang: Software. Chao Wang: Conceptualization. Licheng Jiao: Conceptualization, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We would like to express our sincere appreciation to the editors and the anonymous reviewers for their insightful comments, which have greatly helped us in improving the quality of the paper. This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 62176200 and 61871306, the Natural Science Basic Research Program of Shaanxi under Grant No.2022JC-45 and the Open Research Projects of Zhejiang Lab under Grant 2021KG0AB03, the National Key R&D Program
Ronghua Shang (M’09) received the B.S. degree in information and computation science and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University in 2003 and 2008, respectively. She is currently a professor with Xidian University. Her current research interests include machine learning, pattern recognition evolutionary computation, image processing, and data mining.
References (59)
- et al.
Structured learning for unsupervised feature selection with high-order matrix factorization
Expert Syst. Appl.
(2020) - et al.
Robust neighborhood embedding for unsupervised feature selection
Knowl.-based Syst.
(2020) - et al.
An efficient kernel-based feature extraction using a pull–push method
Appl. Soft Comput.
(2020) - et al.
Nonnegative laplacian embedding guided subspace learning for unsupervised feature selection
Pattern Recogn.
(2019) - et al.
Adaptive s transform for feature extraction in voltage sags
Appl. Soft Comput.
(2019) - et al.
Local discriminative based sparse subspace learning for feature selection
Pattern Recogn.
(2019) - et al.
Feature selection in machine learning: A new perspective
Neurocomputing
(2018) - et al.
Unsupervised feature selection via latent representation learning and manifold regularization
Neural Networks
(2019) - et al.
Wrappers for feature subset selection
Artif. Intell.
(1997) - et al.
Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection
Knowl.-Based Syst.
(2020)
Feature selection with multi-view data: A survey
Inf. Fusion
Locality and similarity preserving embedding for feature selection
Neurocomputing
Unsupervised feature selection by self-paced learning regularization
Pattern Recogn. Lett.
Graph dual regularization non-negative matrix factorization for co-clustering
Pattern Recogn.
Dual-graph regularized concept factorization for clustering
Neurocomputing
An efficient framework for unsupervised feature selection
Neurocomputing
Feature selection under regularized orthogonal least square regression with optimal scaling
Neurocomputing
Unsupervised feature selection with adaptive multiple graph learning
Pattern Recogn.
Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection
IEEE Trans. Cybern.
Adaptive weighted sparse principal component analysis for robust unsupervised feature selection
IEEE Trans. Neural Networks Learn. Syst.
Self-tuned discrimination-aware method for unsupervised feature selection
IEEE Trans. Neural Networks Learn. Syst.
Adaptive unsupervised feature selection with structure regularization
IEEE Trans. Neural Networks Learn. Syst.
Unsupervised feature selection via data reconstruction and side information
IEEE Trans. Image Process.
Efficient and robust feature selection via joint l2,1-norms minimization
Adv. Neural Inf. Process. Syst.
A bayesian approach to joint feature selection and classifier design
IEEE Trans. Pattern Anal. Mach. Intell.
Semi-supervised feature selection via spectral analysis
The fisher-markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data
IEEE Trans. Pattern Anal. Mach. Intell.
Simultaneous feature selection and clustering using mixture models
IEEE Trans. Pattern Anal. Mach. Intell.
A review of feature selection techniques in bioinformatics
Bioinformatics
Cited by (6)
Unsupervised feature selection via dual space-based low redundancy scores and extended OLSDA
2024, Information SciencesSentiment review of coastal assessment using neural network and naïve Bayes
2024, International Journal of Electrical and Computer EngineeringFeature importance measure of a multilayer perceptron based on the presingle-connection layer
2024, Knowledge and Information SystemsAnti-Noise Muiti-View Feature Selection With Sample Constraints
2023, Proceedings - IEEE International Conference on Data Mining, ICDM
Ronghua Shang (M’09) received the B.S. degree in information and computation science and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University in 2003 and 2008, respectively. She is currently a professor with Xidian University. Her current research interests include machine learning, pattern recognition evolutionary computation, image processing, and data mining.
Jiarui Kong received the B.S. degree in college of computer science & engineering from Northwest Normal University, Lanzhou, China. She is currently working toward the master’s degree in school of artificial intelligence from Xidian University, Xi’an,China. Her current research interests include machine learning and data mining.
Lujuan Wang received the B.S. degree in School of Computer Science and Technology from Tianjin Polytechnic University, Tianjin, China. Her current research interests include pattern recognition, machine learning.
Weitong Zhang received the B.E. degree in Electronic and Information Engineering from Changchun University of Science and Technology, Changchun, China, in 2013, the M.S. degree in Electronics and Communication Engineering, and the Ph.D. degree in Electronic science and technology from Xidian University, Xi’an, China, in 2017 and 2021. She is currently a lecturer with Xidian University. Her current research interests include complex networks and machine learning.
Chao Wang received the B.S. degree from Lanzhou University in 2016 and the Ph.D. degree from Zhejiang University in 2021. She is currently an assistant research scientist with the Research Center for Big Data Intelligence, Zhejiang Laboratory. Her research interests include spatial data mining and geographic information science.
Yangyang Li (SM’18) received the B.S. and M.S. degrees in computer science and technology, and the Ph.D. degree in pattern recognition and intelligent system from Xidian University, Xi’an, China, in 2001, 2004, and 2007, respectively. She is currently a Professor with the School of Artificial Intelligence, Xidian University. Her research interests include quantum-inspired evolutionary computation, artificial immune systems, and deep learning.
Licheng Jiao (SM’89) received the B.S. degree from Shanghai Jiaotong University, Shanghai, China, in 1982, the M.S. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990, respectively. From 1990 to 1991, he was a postdoctoral Fellow in the National Key Laboratory for Radar Signal Processing, Xidian University, Xi’an, China. Since 1992, Dr. Jiao has been a Professor in the School of Electronic Engineering at Xidian University. Currently, he is the Director of the Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China at Xidian University, Xi’an, China. Dr. Jiao is a Senior Member of IEEE, member of IEEE Xi’an Section Execution Committee and the Chairman of Awards and Recognition Committee, vice board chairperson of Chinese Association of Artificial Intelligence, councilor of Chinese Institute of Electronics, committee member of Chinese Committee of Neural Networks, and expert of Academic Degrees Committee of the State Council. His research interests include image processing, natural computation, machine learning, and intelligent information processing. He has charged of about 40 important scientific research projects, and published more than 20 monographs and a hundred papers in international journals and conferences.