Locally adaptive multiple kernel clustering
Introduction
In recent years, kernel clustering methods have been proposed to handle the data sets that are not linearly separable in input space [1]. By implicitly mapping the input data into a high-dimensional feature space, kernel clustering methods can discover clusters that are not linearly separable in input space, and computing a linear partition in feature space will result in a nonlinear partition in input space. Experiments [2], [3] have shown that kernelized clustering algorithms generally give better performance than their corresponding conventional clustering algorithms. The well-known kernel clustering methods include the kernel k-means clustering [4], [5], support vector clustering [6], Camastra and Verri algorithm [7] and maximum margin clustering [8]. Besides, Dhillon et al. have proved in theory that the objective of spectral clustering, including ratio cut, normalized cut and ratio association, is mathematically equivalent to the objective of weighted kernel k-means clustering as a trace maximization problem [9].
Recently, multiple kernel learning (MKL) has gained increasing attention in constructing a combinational kernel from a number of homogeneous or even heterogeneous kernels [10], [11], [12], [13], [14], [15], [16], [17], [18]. Conventional multiple kernel learning aims to learn a linear combination of multiple kernels in input space. However, if the local distributions of a data set differ significantly across data, it is difficult to find a globally optimal combination of multiple kernels over the whole input space. A better solution is to use localized multiple kernel learning approaches [19], [20], [21], [22] which assign different weights to a kernel in different regions of the input space. In this paper we address this problem in another way. We propose a localized multiple kernel clustering method, named locally adaptive multiple kernel clustering (LAMKC). Instead of using a uniform combinational kernel over the whole input space, our method associates to each cluster a localized kernel. We assign to each cluster a weight vector for feature selection and combine each weight vector with a Gaussian kernel to form a unique kernel for the corresponding cluster. By optimizing the weight vector and the width parameter of Gaussian kernel jointly for each cluster, each kernel can be localized to match the data distribution of its corresponding cluster. LAMKC performs the following two steps iteratively until convergence: (1) optimizing each kernel for its corresponding cluster, and (2) reassigning each point to the nearest cluster according to the Euclidean distances in feature space which are computed using different corresponding kernels for different clusters. Our method also suffers from initial conditions and we use the Kaufman Approach (KA) [23] combined with k-means clustering to supply a good initial partition.
The remainder of this paper is organized as follows: in Section 2 we introduce the related works. In Section 3 we give a detailed description of our method. Section 4 presents the experimental results and evaluation of our method. Finally, we conclude the paper in Section 5.
Section snippets
Multiple kernel learning
Recent developments in the literatures on SVMs and other kernel methods have shown the need of multiple kernel learning (MKL) which constructs a combinational kernel from a number of basis kernels. Studies [10], [11], [12] have shown that using an optimized combinational kernel instead of a single kernel can improve the classification performance and allows for more flexible encoding of domain knowledge from different sources or cues. It has been studied intensively in the past few years,
Motivation
In the kernel k-means clustering method, the input data is mapped implicitly from the original input space to a high-dimensional feature space by means of a kernel function. To obtain a good performance, the kernel function should match the data distribution of the input data set well. However, if a data set has varying local distributions in different regions, it is difficult to find a global optimal kernel for the whole input space. For example, Fig. 1 shows two clusters with different
Experiments
In this section, we conduct experiments to demonstrate the effectiveness of our method. We compared the LAMKC with the kernel k-means clustering, averaged multiple kernel k-means clustering, self-tuning spectral clustering [27] and Variable Bandwidth Mean Shift methods. The averaged multiple kernel k-means clustering uses the average combination of multiple kernels to perform the kernel k-means clustering. We select the Variable Bandwidth Mean Shift [37] method for comparison because it can
Conclusion
In this paper, we proposed a localized multiple kernel clustering method. Our method is dedicated to the datasets with varying local distributions. Instead of using a uniform combination of multiple kernels over the whole input space, our method associates to each cluster a localized kernel. We assign to each cluster a weight vector for feature selection and combine each weight vector with a Gaussian kernel to form a unique kernel for the corresponding cluster. By learning the weight vector and
Lujiang Zhang received the B.S. degree in Computer Science from Chinese Academy of Sciences. Currently, he is a Ph.D. Candidate at School of Automation Science and Electrical Engineering, Beijing University of Aeronautics & Astronautics, China. His research interests include machine learning and software analysis.
References (39)
- et al.
A survey of kernel and spectral methods for clustering
Pattern Recognit.
(2008) - et al.
Evaluation of the performance of clustering algorithms in kernel-induced feature space
Pattern Recognit.
(2005) - et al.
Non-uniform multiple kernel learning with cluster-based gating functions
Neurocomputing
(2011) - et al.
An empirical comparison of four initialization methods for the k-means algorithm
Pattern Recognit. Lett.
(1999) - et al.
Mean shift spectral clustering
Pattern Recognit.
(2008) - M. Filippone, F. Masulli, S. Rovetta, An experimental comparison of kernel clustering methods, in: Proceedings of the...
- et al.
Nonlinear component analysis as a kernel eigenvalue problem
Neural Comput.
(1998) Mercer kernel-based clustering in feature space
IEEE Trans. Neural Netw.
(2002)- et al.
Support vector clustering
J. Mach. Learn. Res.
(2001) - et al.
A novel kernel method for clustering
IEEE Trans. Pattern Anal. Mach. Intell.
(2005)
A unified view of kernel k-means, spectral clustering and graph cuts
Comput. Complex.
Learning the kernel matrix with semidefinite programming
J. Mach. Learn. Res.
Large scale multiple kernel learning
J. Mach. Learn. Res.
Simple MKL
J. Mach. Learn. Res.
An extended level method for efficient multiple kernel learning
Adv. Neural Inf. Process. Syst.
Efficient and accurate lp-norm multiple kernel learning
Adv. Neural Inf. Process. Syst.
Cited by (18)
Consensus and complementary regularized non-negative matrix factorization for multi-view image clustering
2023, Information SciencesCitation Excerpt :The key for MVC is to learn a low-dimensional representation with high-quality from multi-view data. This problem has attracted numerous studies based on different techniques such as K-means [36,6], non-negative matrix factorization (NMF) [17], spectral [13,32,1], kernel [24,43], etc. Since multi-view data contains not only the intrinsic properties of one object, but also reflects specific features about the object from different perspectives.
Beyond global fusion: A group-aware fusion approach for multi-view image clustering
2019, Information SciencesCitation Excerpt :Considering this point, several local learning methods are developed [10,31,40,43] and achieve better performance. A local fusion based multiple kernel clustering method is proposed in [43], where each cluster is learned with a localized kernel. Tsai et al. [31] integrates multiple kernel learning into self-organizing map, which can discover the cluster-specific characteristics via learning the per-cluster ensemble kernels.
Localized Multiple Kernel learning for Anomaly Detection: One-class Classification
2019, Knowledge-Based SystemsGaussian kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters
2018, Pattern RecognitionCitation Excerpt :Ref. [40] introduces two schemes for kernel fuzzy clustering based on random feature mapping and dimension reduction, whereas Ref. [55] evaluates the Nystrom approximation for kernel k-means clustering. Various algorithms based on multiple kernels were proposed in Refs. [1,17,27,41,63,64]. Applications on SAR images segmentation can be found in Refs. [56,60].
Multiple kernel clustering with corrupted kernels
2017, NeurocomputingCitation Excerpt :In [9], the kernel weights are assigned to the information of the corresponding view and a parameter is used to control the sparsity of these weights. In [10], they propose a localized multiple kernel clustering method, which is dedicated to the dataset with varying local distributions. Gönen and Margolin [11] combine kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data.
Lujiang Zhang received the B.S. degree in Computer Science from Chinese Academy of Sciences. Currently, he is a Ph.D. Candidate at School of Automation Science and Electrical Engineering, Beijing University of Aeronautics & Astronautics, China. His research interests include machine learning and software analysis.
Xiaohui Hu received the Ph.D. degree from School of Computer Science and Engineering, Beijing University of Aeronautics & Astronautics, China. Currently, he is a Senior Researcher at Institute of Software, Chinese Academy of Sciences, and serves as an Adjunct Professor at Beijing University of Aeronautics & Astronautics. His research interests include information systems integration and computer simulation technology.