Multi-kernel graph fusion for spectral clustering
Introduction
In real applications, manual annotation usually takes an expensive cost (Ma, You, Jing, Li, & Lu, 2020), as one of unsupervised learning without the helping of the labels, clustering has been increasingly paid attention in real applications (Zhang, Liu, Shen, Shen, & Shao, 2019), such as medical data analysis (Gan, et al., 2022, Zhu et al., 2022), biological data analysis (Gan, et al., 2021, Hu, et al., 2021), multimedia retrieval (Zhang, et al., 2021), image categorization (Bansal & Sharma, 2021), target detection (Li, de la Prieta Pintado, Corchado, & Bajo, 2017), and anomaly detection (Ghezelbash, Maghsoudi, & Carranza, 2020). Specifically, clustering is focused on dividing samples into different clusters so that similar samples are in the same cluster and dissimilar samples are in different clusters. Previous clustering methods include prototype clustering (Dilip, 2021), kernel clustering (Sun, et al., 2022), subspace clustering (Lu, Feng, Lin, Mei, & Yan, 2019), matrix factorization clustering (Haeffele & Vidal, 2020), and so on (El Hajjar, Dornaika, and Abdallah, 2022, Zhu, Zhang, Li, et al., 2018). Both -means (Yu, Xu, Chen, Bai, & Wang, 2022) and spectral clustering (Shi & Malik, 2000) are popular clustering methods. In particular, -means is an efficient method but it cannot conduct the clustering task on arbitrary data shapes. Compared with -means, spectral clustering (Boedihardjo, Deng, & Strohmer, 2021) can obtain the optimal solution on arbitrary data shapes based on the graph theory (Zhu, Zhang, Zhu, Zhu, & Gao, 2020). However, spectral clustering is constructed by Euclidean distance on the original data, which usually requires the existence of linearly separable hyperplane (Zhong, Shu, Huang, & Yan, 2022).
In real applications, the nonlinear relationship is often assumed for conducting clustering analysis. As a result, the kernel clustering method (Wang, Lu, Lu, Nie, & Li, 2022) is widely used to capture the nonlinear relationship among the samples. For example, single-kernel graph-based clustering (SKGC) deals with the nonlinear relationship by a single kernel, but it is highly dependent on the selected kernel. As its alternative, multi-kernel graph-based clustering (MKGC) automatically selects kernels and depends on multiple base kernels (El Hajjar, Dornaika, Abdallah, and Barrena, 2022, Kang et al., 2018, Manna et al., 2021, Ren and Sun, 2021). However, MKGC implicitly assumes that every base kernel has an equal contribution to multiple kernel learning. Actually, different kernels have different contributions to the clustering analysis, so different kernels should have different weights (Liu, et al., 2022). Therefore, many MKGC methods first design a weight strategy to consider the kernel diversity and then integrate multiple base kernels to produce a combined kernel (Chamakura & Saha, 2022). Most previous MKGC methods represent the combined kernel as weighted combinations of base kernels. Different MKGC methods have different weight strategies. The existing strategies can be divided into two paradigms, i.e., linear-distance such as Euclidean distance (El Hajjar, Dornaika, Abdallah, and Barrena, 2022, Kang et al., 2018, Manna et al., 2021) and nonlinear-distance such as heat kernel distance (Ren & Sun, 2021). Usually, the weight assigned to the base kernel is large if the linear/nonlinear distance between the base kernel and the combined kernel is small. Moreover, previous MKGC methods integrate prior data information in base kernel spaces to simultaneously learn refined fusion graphs in combined kernel spaces.
Previous MKGC method has received significant success in real applications, but they still suffer from some limitations to be addressed. First, many existing MKGC methods use the min–min strategy to calculate the weights of base kernels, which increases the imbalance of lateralization within base kernels. As a result, the spatial structure of the final combined kernel is largely determined by the most powerful base kernel. This completely ignores the contributions of other base kernels. In addition, the original data often contains noise and redundancy. All of these issues may possibly output the fusion graph with low quality. Second, previous spectral methods (e.g., Du, et al., 2015, Huang et al., 2012a, Shi and Malik, 2000, Zhang et al., 2019, Zhu, Zhang, Li, et al., 2018) conduct a two-step strategy to conduct clustering analysis, i.e., separately spectral representation learning and -means clustering. These previous methods have been demonstrated to output suboptimal clustering results (Kang et al., 2018, Lu et al., 2019, Peng, et al., 2022, Xie et al., 2018, Yuan et al., 2021, Zhu, Zhang, He, et al., 2018).
In this paper, we propose a novel MKGC method, namely multi-kernel graph fusion based on min–max optimization (MKGF-MM) to address the above issues. The goal of the proposed method is to reduce the imbalance among base kernels as well as to learn a more reasonable combined kernel. To do this, the proposed method first obtains multiple base kernels by different kernel functions; and then proposes a new weighted method to iteratively adjust the weights of all base kernels to output a combined kernel. In particular, we investigate a min–max weight strategy based on the game theory to effectively integrate heterogeneous features of data in different base kernel spaces, as well as simultaneously optimize the combined kernel and the fusion graph in a unified framework. This leads to adaptively learning the weights of all base kernels and improving the robustness of the fusion graph. After this, we further map the combined graph to the Reproducing Kernel Hilbert Space (RKHS), where the samples can linearly conduct clustering analysis. We also add a regularization term in the proposed objective function to conduct one-step clustering, which easily avoids the sup-optimal issue of the two-step strategy (Yuan et al., 2021, Zhu, Zhang, He, et al., 2018). We list the framework of the proposed method in Fig. 1.
Compared to previous clustering methods, the main contributions of the proposed method are listed as follows.
- •
We propose a novel min–max weight strategy based on game theory to improve the quality of combined kernel and the fusion graph. Therefore, the proposed optimization strategy avoids the problems of kernel selection and parameter tuning to improve the quality and robustness of the fusion graph.
- •
We propose an alternating iterative optimization process to solve the proposed objective function and theoretically prove the convergence of the proposed optimization method. Our experimental results on real datasets further verify the fast convergence of our optimization method. In particular, the proposed optimization can easily be applied in all real applications, such as multi-view clustering, object detection, and image segmentation.
Section snippets
Notations
In this paper, bold capital letters, lowercase symbols, and normal italic letters are used to denote matrices, vectors, and scalars. Some important notations are summarized in Table 1.
Spectral clustering
Spectral clustering (Shi & Malik, 2000) is a machine learning method established on the basis of spectral partitioning. It evolved from the method of how to optimize the subgraph division (Yuan et al., 2021). How these methods perform is mainly dependent on the graph constructed (Wang, Li, Tao, Dong, Wang, & Liu,
Multi-kernel graph fusion based on min–max optimization
Different from ordinary min–min optimization, min–max optimization involves two or more variables that compete and interact with each other. These two variables that need to be optimized are combined to obtain the optimization problem of the following objective model: where is a combined kernel. is subject to the constraint of . Since the corresponding graph contains
Datasets
To verify the clustering performance of MKGF-MM, this paper uses nine standard public datasets to conduct experiments, including BUPA, HEARTFAILURE, LUNG, MESSIDOR, PARKINSONS, PIMA, SPECT, WDBC, and YEAST. The nine datasets relevant to the areas of medicine and life sciences are from the UCI Machine Learning Repository.1 The summaries of the nine datasets are shown in Table 2:
At the same time, the experiment normalized the original data, using kernel
Conclusion
The proposed method provides a novel graph-based clustering model with unsupervised MKL, which carries on learning combined kernel and fusion graph under the co-optimization strategy of the min–max game. This method solves the problem of data clustering with nonlinear relation, avoids the problem of kernel selection and parameter tuning, and improves the quality of the fusion graph. The proposed MKGF-MM method is compared with two classical SKL clustering methods and seven popular MKL
CRediT authorship contribution statement
Bo Zhou: Conceptualization, Methodology, Data curation, Formal analysis, Investigation, Supervision, Project administration, Writing – review & editing. Wenliang Liu: Resources, Writing – review & editing, Typography, Project administration. Wenzhen Zhang: Software, Experiment. Zhengyu Lu: Experimental data collection, Experiment. Qianlin Tan: Experimental data collection, Funding acquisition.
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (61876046, 62166003); Natural Science Project of Guangxi Universities (2021KY0061); and Promoting Project of Basic Capacity for Young and Middle-aged University Teachers in Guangxi (2018KY0493, 2019KY0062, 2021KY0619).
References (46)
- et al.
A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization
Information Processing & Management
(2021) - et al.
Localized multiple kernel learning using graph modularity
Pattern Recognition Letters
(2022) - et al.
Adaptive and structured graph learning for semi-supervised clustering
Information Processing & Management
(2022) - et al.
One-step multi-view spectral clustering with cluster label correlation graph
Information Sciences
(2022) - et al.
Consensus graph and spectral representation for one-step multi-view kernel based clustering
Knowledge-Based Systems
(2022) - et al.
Brain functional connectivity analysis based on multi-graph fusion
Medical Image Analysis
(2021) - et al.
Optimization of geochemical anomaly detection using a novel genetic k-means clustering (GKMC) algorithm
Computers & Geosciences
(2020) - et al.
Multi-view spectral clustering via common structure maximization of local and global representations
Neural Networks
(2021) - et al.
Kernel-driven similarity learning
Neurocomputing
(2017) - et al.
Multi-source homogeneous data clustering for multi-target detection from cluttered background with misdetection
Applied Soft Computing
(2017)
Robust adaptive semi-supervised classification method based on dynamic graph and self-paced learning
Information Processing & Management
Multi-source domain adaptation with graph embedding and adaptive label prediction
Information Processing & Management
Robust kernelized graph-based learning
Pattern Recognition
Trio-based collaborative multi-view graph clustering with multiple constraints
Information Processing & Management
One-step spectral rotation clustering for imbalanced high-dimensional data
Information Processing & Management
Incremental multi-view spectral clustering with sparse and connected graph learning
Neural Networks
Adaptive reverse graph learning for robust subspace learning
Information Processing & Management
Multi-view spectral clustering by simultaneous consensus graph learning and discretization
Knowledge-Based Systems
Interpretable learning based dynamic graph convolutional networks for alzheimer’s disease analysis
Information Fusion
A performance guarantee for spectral clustering
SIAM Journal on Mathematics of Data Science
An efficient privacy preserving on high-order heterogeneous data using fuzzy K-prototype clustering
Journal of Ambient Intelligence and Humanized Computing
Multi-graph fusion for dynamic graph convolutional network
IEEE Transactions on Neural Networks and Learning Systems
Cited by (5)
Local kernels based graph learning for multiple kernel clustering
2024, Pattern RecognitionPure kernel graph fusion tensor subspace clustering under non-negative matrix factorization framework
2024, Information Processing and ManagementAdaptive graph fusion learning for multi-view spectral clustering
2023, Pattern Recognition LettersGeneralized possibilistic c-means clustering with double weighting exponents
2023, Information SciencesA new semi-supervised fuzzy K-means clustering method with dynamic adjustment and label discrimination
2024, Neural Computing and Applications