Elsevier

Neurocomputing

Volume 137, 5 August 2014, Pages 192-197
Neurocomputing

Locally adaptive multiple kernel clustering

https://doi.org/10.1016/j.neucom.2013.05.064Get rights and content

Abstract

Conventional multiple kernel learning aims to construct a global combination of multiple kernels in input space. For a data set which has varying local distributions in input space, using a uniform combination of multiple kernels may not always work well. In this paper, we proposed a localized multiple kernel learning method for clustering. Instead of using a uniform combinational kernel over the whole input space, our method associates to each cluster a localized kernel. We assign to each cluster a weight vector for feature selection and combine each weight vector with a Gaussian kernel to form a unique kernel for the corresponding cluster. By optimizing the weight vector and the width parameter of Gaussian kernel jointly for each cluster, each kernel can be localized to match the data distribution of its corresponding cluster. A locally adaptive strategy based on the kernel k-means clustering is used to optimize the kernel for each cluster. We experimentally compared our methods to the kernel k-means clustering, averaged multiple kernel clustering, self-tuning spectral clustering and Variable Bandwidth Mean Shift algorithm. Experimental results demonstrate the effectiveness of our method.

Introduction

In recent years, kernel clustering methods have been proposed to handle the data sets that are not linearly separable in input space [1]. By implicitly mapping the input data into a high-dimensional feature space, kernel clustering methods can discover clusters that are not linearly separable in input space, and computing a linear partition in feature space will result in a nonlinear partition in input space. Experiments [2], [3] have shown that kernelized clustering algorithms generally give better performance than their corresponding conventional clustering algorithms. The well-known kernel clustering methods include the kernel k-means clustering [4], [5], support vector clustering [6], Camastra and Verri algorithm [7] and maximum margin clustering [8]. Besides, Dhillon et al. have proved in theory that the objective of spectral clustering, including ratio cut, normalized cut and ratio association, is mathematically equivalent to the objective of weighted kernel k-means clustering as a trace maximization problem [9].

Recently, multiple kernel learning (MKL) has gained increasing attention in constructing a combinational kernel from a number of homogeneous or even heterogeneous kernels [10], [11], [12], [13], [14], [15], [16], [17], [18]. Conventional multiple kernel learning aims to learn a linear combination of multiple kernels in input space. However, if the local distributions of a data set differ significantly across data, it is difficult to find a globally optimal combination of multiple kernels over the whole input space. A better solution is to use localized multiple kernel learning approaches [19], [20], [21], [22] which assign different weights to a kernel in different regions of the input space. In this paper we address this problem in another way. We propose a localized multiple kernel clustering method, named locally adaptive multiple kernel clustering (LAMKC). Instead of using a uniform combinational kernel over the whole input space, our method associates to each cluster a localized kernel. We assign to each cluster a weight vector for feature selection and combine each weight vector with a Gaussian kernel to form a unique kernel for the corresponding cluster. By optimizing the weight vector and the width parameter of Gaussian kernel jointly for each cluster, each kernel can be localized to match the data distribution of its corresponding cluster. LAMKC performs the following two steps iteratively until convergence: (1) optimizing each kernel for its corresponding cluster, and (2) reassigning each point to the nearest cluster according to the Euclidean distances in feature space which are computed using different corresponding kernels for different clusters. Our method also suffers from initial conditions and we use the Kaufman Approach (KA) [23] combined with k-means clustering to supply a good initial partition.

The remainder of this paper is organized as follows: in Section 2 we introduce the related works. In Section 3 we give a detailed description of our method. Section 4 presents the experimental results and evaluation of our method. Finally, we conclude the paper in Section 5.

Section snippets

Multiple kernel learning

Recent developments in the literatures on SVMs and other kernel methods have shown the need of multiple kernel learning (MKL) which constructs a combinational kernel from a number of basis kernels. Studies [10], [11], [12] have shown that using an optimized combinational kernel instead of a single kernel can improve the classification performance and allows for more flexible encoding of domain knowledge from different sources or cues. It has been studied intensively in the past few years,

Motivation

In the kernel k-means clustering method, the input data is mapped implicitly from the original input space to a high-dimensional feature space by means of a kernel function. To obtain a good performance, the kernel function should match the data distribution of the input data set well. However, if a data set has varying local distributions in different regions, it is difficult to find a global optimal kernel for the whole input space. For example, Fig. 1 shows two clusters with different

Experiments

In this section, we conduct experiments to demonstrate the effectiveness of our method. We compared the LAMKC with the kernel k-means clustering, averaged multiple kernel k-means clustering, self-tuning spectral clustering [27] and Variable Bandwidth Mean Shift methods. The averaged multiple kernel k-means clustering uses the average combination of multiple kernels to perform the kernel k-means clustering. We select the Variable Bandwidth Mean Shift [37] method for comparison because it can

Conclusion

In this paper, we proposed a localized multiple kernel clustering method. Our method is dedicated to the datasets with varying local distributions. Instead of using a uniform combination of multiple kernels over the whole input space, our method associates to each cluster a localized kernel. We assign to each cluster a weight vector for feature selection and combine each weight vector with a Gaussian kernel to form a unique kernel for the corresponding cluster. By learning the weight vector and

Lujiang Zhang received the B.S. degree in Computer Science from Chinese Academy of Sciences. Currently, he is a Ph.D. Candidate at School of Automation Science and Electrical Engineering, Beijing University of Aeronautics & Astronautics, China. His research interests include machine learning and software analysis.

References (39)

  • L. Xu et al.
    (2005)
  • I.S. Dhillon et al.

    A unified view of kernel k-means, spectral clustering and graph cuts

    Comput. Complex.

    (2005)
  • G. Lanckriet et al.

    Learning the kernel matrix with semidefinite programming

    J. Mach. Learn. Res.

    (2004)
  • F.R. Bach, G.R.G. Lanckriet, M.I. Jordan, Multiple kernel learning, conic duality, and the smo algorithm, in:...
  • S. Sonnenburg et al.

    Large scale multiple kernel learning

    J. Mach. Learn. Res.

    (2006)
  • A. Rakotomamonjy et al.

    Simple MKL

    J. Mach. Learn. Res.

    (2008)
  • Z. Xu et al.

    An extended level method for efficient multiple kernel learning

    Adv. Neural Inf. Process. Syst.

    (2009)
  • P.V. Gehler, S. Nowozin, Infinite kernel learning, Technical Report No. TR-178, Max Planck Institute for Biological...
  • M. Kloft et al.

    Efficient and accurate lp-norm multiple kernel learning

    Adv. Neural Inf. Process. Syst.

    (2009)
  • Cited by (18)

    • Consensus and complementary regularized non-negative matrix factorization for multi-view image clustering

      2023, Information Sciences
      Citation Excerpt :

      The key for MVC is to learn a low-dimensional representation with high-quality from multi-view data. This problem has attracted numerous studies based on different techniques such as K-means [36,6], non-negative matrix factorization (NMF) [17], spectral [13,32,1], kernel [24,43], etc. Since multi-view data contains not only the intrinsic properties of one object, but also reflects specific features about the object from different perspectives.

    • Beyond global fusion: A group-aware fusion approach for multi-view image clustering

      2019, Information Sciences
      Citation Excerpt :

      Considering this point, several local learning methods are developed [10,31,40,43] and achieve better performance. A local fusion based multiple kernel clustering method is proposed in [43], where each cluster is learned with a localized kernel. Tsai et al. [31] integrates multiple kernel learning into self-organizing map, which can discover the cluster-specific characteristics via learning the per-cluster ensemble kernels.

    • Gaussian kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters

      2018, Pattern Recognition
      Citation Excerpt :

      Ref. [40] introduces two schemes for kernel fuzzy clustering based on random feature mapping and dimension reduction, whereas Ref. [55] evaluates the Nystrom approximation for kernel k-means clustering. Various algorithms based on multiple kernels were proposed in Refs. [1,17,27,41,63,64]. Applications on SAR images segmentation can be found in Refs. [56,60].

    • Multiple kernel clustering with corrupted kernels

      2017, Neurocomputing
      Citation Excerpt :

      In [9], the kernel weights are assigned to the information of the corresponding view and a parameter is used to control the sparsity of these weights. In [10], they propose a localized multiple kernel clustering method, which is dedicated to the dataset with varying local distributions. Gönen and Margolin [11] combine kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data.

    View all citing articles on Scopus

    Lujiang Zhang received the B.S. degree in Computer Science from Chinese Academy of Sciences. Currently, he is a Ph.D. Candidate at School of Automation Science and Electrical Engineering, Beijing University of Aeronautics & Astronautics, China. His research interests include machine learning and software analysis.

    Xiaohui Hu received the Ph.D. degree from School of Computer Science and Engineering, Beijing University of Aeronautics & Astronautics, China. Currently, he is a Senior Researcher at Institute of Software, Chinese Academy of Sciences, and serves as an Adjunct Professor at Beijing University of Aeronautics & Astronautics. His research interests include information systems integration and computer simulation technology.

    View full text